



EDUCATIONAL AND PSYCHOLOGICAL 
MEASUREMENT 


A quarterly journal devoted to tlie development and application of 
measures of individual differences. 


EDITOR 

G. Frederic Kuder .United States Civil Seivice Coninu .jun 


ASSOCIATE EDITORS 


Dorothy C. Adkins .Social Security Hoatd 

Forrest A. Kingsbury .University of Chicat'o 

M W Richardson .Adjutant Genci.il’s Offiu', A U. S. 


BOARD OF COOPERATING EDITORS 


Richard D. Allen 
Providence Public Schools 

John G. Darley 

University of Minnesota 

Harold A. Edgerton 
Ohio State University 

Max D. Engclhart 

Chicano City Junior Colleges 

E. B. Greene 

Social Secunty Board 

J. P. Guileord 

University of Southern California 

E. F. Lindquist 
State University of lonua 


P. J. Rulon 

Uarvatd Umvrrsity 

David Seoul 

U, S, Office of HJiJCtiliun 

C. L. SlLAUTLl' 

Social Seittnly hoard 

II. C. Taylor 

IPestern Eleciric Cornfany 

Thelma G. Tuurstonk 

Chicago Teachers College 

Herbert A. Tooi>,s 

Ohio Stale University 

E. G. Williamson 

University of Minnesota 


Ben D. Wood 

Columbia University 


of tills research on the development and use 

of problems of measurement In®yanous purpoacs, (3) diseuK.ioiis 
laneous notes l ” or in specific fields, nnd (-t) nuHCcl- 

types of items " 





INDEX FOR VOLUME TI 

/lldriili, Mai (//net CAiicklei 

Ak Exi’umvKmv Sildv oi Sohm, Oriinxii' ai ihi Coi,- 

i.ixii'LiviJ. .2fD 

A Urn, Ruhnul I) and Kinnv. Lv^lri t\ 

Educ \tion iL Rinniii mi m.s \\n Oa 11 ' miovm, Li vi i,s, .371 

Banii'^, fC- 

A Ti'CtiNiyn idu'Fishm; I'mimusi wdim; oi rin Visr.M, 

Arts . .34'i 

Beers, I'\ S 

Tin E.wminirs Oimh ni- nii Umviush\ Svsiim nr 

GruRf.n . . , , 20 

Berdie, Ralph F. 

An'A ll) 10 Skdi N'l 0)r.\sii,(m,s ... 2fll 

Bi (non, Edunn J, 

So.MIi OF Tlili LhS.S Ml,.lSl-K.\llI,I, Ol ItOMI.S OT Eon'\ I ION ..153 
Caid/dl, Alfn'd J, 

A Ti:.vr tor Prim.arv IJumni.ss Iniirisis B.\sin ox a 

ETnCTIOXAL OcCt/I'.VnON.M, Cl.\SSlTU'\Ul)N.113 

Churchill, Ruth D,, Cut lis, Jeanne M , C'lninhs, f.'/j de I!., and 
Ilaiiall, Thomas IF. 

EI'T'I'Ct nr Enoinmr Scikhii, 'I'k.vinim, on uii Si ki.u i 

DliVM.Ol'MliKl' 'l'i;sT.i7‘) 

Cleveland, Eaile, Fauhlun, Rithaid 11’., and Ilanell, Tham\n IV, 
Ai’th'udi 'Er.si.s ior Army Wi miu'; Ormiuir S'ipdinis 335 
Cii.wy, IVillinni ,/, hi and IVantmnn, .\{, J 

Mi,.\si:ri Ml M'Asi'i t'l.s 01 nil N\no\\i C’l.i nic,\i. .’\inr.i) v 

'I’l'.s'iiNc; I’roorvm.37 

Evans, Catlanine anti II tenn, C. Cillietl 

IntROVRRSKIN-Ex’IROVI RSKIV AS A FaI'KIR IN 'ElAOIirK- 

Trainino. M 

Fnubinn, Riihnid IF,, Cleveland, Earle ,1., and 
Il/nrcll, Thomas IF. 

'Eniv iNn.ui'Nci'' or Tkuninc, on .Mi on\. mom. Aniinii 

Trst Scori:.s.<11 

Froehlnh, Chffnid 

A Study or the Grniry Vnu.vi ki.n.m. Ini i nioiu. 75 . 

Guilffitd, J. P., [mH'U, Cnnstanrr, and U'dltnms, Ruth M, 

CoMi’LRTi i,y VLtohiii) ViRsi’s rNwiu.iirii) ScnuiNii in 

AN Anim VhMI NT EXAMIN.ATKIN. 15 

Ilalin, Milton E, 

Levers oi' iv /v..-- 






Jui gensen, Clifford E 

A Test for Selecting and Training Indu.sthi \i, 'J'vi'isis 4(1') 

Kormij Sidney W. 

Machines in Civil Service Testing.1()7 

Ligon, Ernest M. 

The Administration of Grouf Tests.^,S7 

Loir, Maurice and Menter, Ralph K. 

The Optimum Use of Test Data. 


Lurie, Wdltei A. 

The Concept of Occupational Adjustment . ] 

McQuitty, John V. 

Procedure for Handling Tests and ICxaminairins . , 1Sa 

Master, Charles 1, 

Measurement in Rural Housing ; A Prelimiv un Ri-i'im i, I P) 

Oberheim, Grace M. 

The Prediction op Success of StudeN'i Assisi w is in 

College Library Work. 

Owens, William A. Jr. 

Intra-Individual Differences Versus iNTi.R-lNDivinuAi, 

Differences in Motor Skills . v)') 

Sat bin, Theodore R, and Anderson, Hedwin C. 

A Preliminary Study of the Relation of iMrAsnu u In- 
TEREST Patterns and Occupational DissATisrAU'iinv.. ,1L 
Schrammel, H. E. 

The Purpose, Origin, Plan of PaocrDuiu, \m) Valuis 
OF THE Nation-Wide Every Pupil Scholarship Ti vi s , .4lll 
Sperling, Abraham 

A Comparison of the Human Behavior Inventory wii n 

Two Other Personality Inventories .jO] 

Super, Donald E. 


The Place of Aptitude Testing in the Public vSciuioi s 2(.7 
Tussing, Lyle 

An Investigation of the Possibilities of A-lEAsuRiNt; Fi n- 
soNALiTY Traits with the Strong Vocational [nifr. 

EST Blank. ' 

Waison, Robert J. 


The Relatonship of the Affective Tolerance Ikvi v- 

TORY to Other Personality Inventories, . * SM 

Welker, E L. and Hat rell, T W. . 

Predictive Value of Certain “Law Aptitude" Tests. ^01 

Wood, Ray G - ‘ 


The Aims, Objectives, and Outcomes 
iNG Program. 


OF Tin: Ohio Tin 


I'- 


1 









EDUCATIONAL AND PSYCHOL!)GICAI. 

MEASUREiMl'NT 


Volume II JANUARY, 1Q42 Nuiiihei 1 

Till! CoNCFPT OF OcCUI'ATIONAI, AnfUSHd \"l.I 

IFnher A, Lune 


CoMFLiiTM-Y WiiGinrai Vfksus Unwiiom'ih) Scoiunc i\ w 

Aciifivfmfni'Examination.IS 

J, P Gnilforil, (Unntance Liivfll, and Riilh M. H’illuitin 

A PriiUminauy Study of tiii; Ru-ation oi Miasdtud I.vii h- 


FST Pa'ITFIINS and OoCUP VnON VI, DlSSA’llsrAl-llON ... , H 

Thenduie R Saihiti and Hcdiuiii (!, Jndvmiii 

MriAsuiU!Mi;NT Aspu'T.s OF TUI- N'vno.vvi, Cmruai, Ahimiv 

Tfs’i'ino Program. — M 

IIG’lliarii y, IL Ciissv and M. J Il’untmnii 


iNTROVIiRSlON-ExTROVHRSlON VS A EvriOll IN 'l'l',AGUFR''rU VININO 47 
Ca/haitne Evans and (! (lillx’il II nnni 

An InvI'STigation of tiii Possiinuiii.s or Ml vsniiNo PijisoN- 
AUi’Y Tuai'is WTi 11 nil. Strong Voiwi ionai, 1 n'ii hi m- Hi, vnk S'l 
Lille Tussinq 

A Study of tiii; Gkntry Vocationai. Invt.ntorv.7S 

Clifford Fioeldich 

The Relationship of the Afftctive 'I'oi.fr.vnci, Invintorv 

'10 Other Personality Invent dries.S3 

Robert L Watson 

The Influence of Training' on Mechanic u, Aimii uni Ti s'l 
Scores.<)1 

Richard IT. I'aulnan, Earle A. Cleveland, and Thomas IT, llanell 







CopyriR-hU iOiJ, l)y 

SCIRNCK RFSK Mit’li ASSOriATKS 


l-RlNTEn IN THE UNITED STATES «F AME«ICA 



THE CONCEPT OF OCCUPiVEIONAE 
ADJUSTMI'.Nl'* 


WAI-TKK A. I.r'KIl', 

Juwiih Vncational Srri'icc anil Kin|i|ovintfiH (enter, t'liicaKi’ 

I. TFIK I'ROHLKM 

T he need for criteria of occupatinnal adjustment arises 
from the attempt to evaluate educational and vocational 
guidance programs. Many criteria have heen proposed, most 
of them falling into one of the following groups: earnings, job 
performance, job satisfaction, stability of employment, level 
of work done, social value of wrtrk done, and reali/atiftn uf 
potentialities. 

Several more recent papcis have dealt speeilieallv with tlie 
weaknesses of studies based on various criteria. Stott (2), m 
summarizing British experience with a number of the proposed 
criteria, stressed particularly the sources of unreliability in 
each of the suggested estimates. Williamson and Hordin (H) 
have reviewed studies which they are careful to tlesignate as 
“evaluation of counseling programs" rather than of "adjust¬ 
ment," pointing out structural defects and specific weaknesses 
in these investigations, Vitcles, who had in PJ32 suggested 
accepting “satisfaction and economic efficiency as independent 
criteria of adjustment in work" (6, p.l4(l), in 19.16 proposetl 
a clinical measure, the “dynamic criterion" (7), based upon 
the extent to which the individual had realized his capacity for 
vocational success. Williamson and Bordin (H) advocate a 
“judgment criterion,” also a clinical estimate. 

These various approaches delimit four possible methods 
of evaluating vocational adjustment, 

H wish to acknowledge ray gratiluilc to Dr. Irving I,orge for supiilying nir 
with some of the data u,iea in this study, 1 also wish (o thank Mr. ,S, 'I'. Fried¬ 
man for assistance with the coinpulation',. 

3 

% 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMEN'f 


A. The first method is to accept one of the proposed 
criteria as the essence of occupational adjustment and to define 
the various degrees of excellence of adjustment in terms of that 
variable alone. Investigations of vocational adjustment are 
simply made in terms of the relative job-satisfaction of the 
individuals, or of their relative earnings, and so forth. I'he 
choice of any single criterion is obviously arbitrary. 

B The second possible method of evaluating vocational 
adjustment is to observe two or more of the criterion variables 
and to combine the ratings into a single score of vocational 
adjustment. One might, for instance, weight job satisfaction as 
5, skill as 2, earnings as 2, job status as 1, and designate the 
weighted criterion score as a measure of occupational adjust¬ 
ment. Even if a more complex functional relationship is pos¬ 
tulated, the arbitrary and inflexible nature of this procedure is 
immediately apparent. 


C. The clinical or judgmental method of evaluating adjust¬ 
ment is in most current use. Williamson and Bordin (8, p. 17) 
describe the “judgment critciion” as one “by means of which 
the adjustment of the student is csliiniilcd in terms of his 
onginal problems and of the available data, including the part 
criteria” (i.e., the various separate criteria which have been 
proposed). In contrast to the first two methods, the clinical 
method uses all the available data, rather than a restricted 
set. The method of combining data is not mechanical and 
inflexible, as in an arithmetical combination. The person 
making the judgment is expected to take individual circum¬ 
stances into consideration, in an effort to obtain an integrated 
representation of the adjustment of each unique personality. 
By the exercise of clinical insight, when combining data, the 
judges in effect assign a set of weights characteristic of the 
individuals judged: earnings are unimportant for John Doe, 
who comes from a wealthy family; the neurotic Jane Smith 
will never be more satisfied in any other job than in this one; 
job status IS particularly important for Richard Roc because 

and similar 

P . While the skill and intuition of the counselor 


4 



TtlE CONCEPT OF fJCCITPATIONAL ADJUSTMENT 


obviously play an important- role in assigmiif^ irulividuals to 
places on the scale of vocational adjustment, fair agreement is 
obtained among judges (9). 

There is, however, one basic assumption upon which the 
validity of the clinical method as an instrument of analysis 
depends entirely, ll iisstivir.'i that ihfrc ii siifh a piyi'holoijical 
entity as o/cnpalionnl culjtistnu’ui, that it is therefore possible 
to define a linear contimmin corrcsjKinding to cxeellencc of 
adjustment in various degrees and to locate Individuals at 
definite points on this scale, d’lie clinical method of evaluation 
shares this fundamental assumption nitli the lir.st two pro¬ 
cedures, the selection of a single criterion ami the cornhination 
of criteria by formula. It diftets only In advocating clinical 
judgment as the instrument foi assigning individuals to [mints 
on the scale of adjustment. 

Tt is not my purpose, in (jiicstioning the validity of thi.s 
assumption, to disparage the role of clinical insight in cmmsel- 
ing. No one who has attempted to guide individuals in the 
choice of careers and preparation for them, and later to 
evaluate the results, tan deny that clinical intuition gives more 
full-bodied atui mcaninglul results than the use of a single 
criterion or the meehamca! cornhination of criteria. This does 
not preclude the possibility that the method rests u[um a faulty 
premise. It is always possible to [iroject data of any degree of 
complexity upon a single axis, cither by formula or llexihly, 
on a casc-by-casc basis. But this proces.s will necessarily he 
arbitrary and meaningless if the structure is not, in actual fact, 
unidimcnsional If there is no such linear contiiunim as occu¬ 
pational adjustment, all three methods which we have con¬ 
sidered would be invalidated, the clinical as well as the oilier 
two. The superior satisfaction which the counselni receives 
from the use of the clinical method may rellect an actual lack 
of precision, which covers up more elfeclively than the mechan¬ 
ical procedure the Incompatlhilily of data combined in hi.s 
judgment, d'he only cheek upon his estimate.s, namely, his 
agreement with other judges, may show merely tlie extent to 
which they share his preconceptions, 


5 



EDUCATIONAL AND PSYCHOLOGICAL MHASf REMKNT 


D. It Is therefore advisable to seek a fourth procedure 
for evaluating occupational adjustment, diftering from the 
first three in its fundamental premise The nature of such a 
procedure is clear in a simpler, but fully analogous, situation. 

Let us suppose that it is our problem to evaluate the si/e 
of human adults, instead of their occupational adjustment. The 
simplest thing to do would be to define size in terms of, let us 
say, weight. This would be comparable to identifying occupa¬ 
tional adjustment with job satisfaction. But why not choose 
height, or volume? The choice is arbitrary. A second possi¬ 
bility is to develop a formula combining height and weight. 
This would require many individual exceptions, because of 
differences in sex, age, skeletal structure, incidence of cripjding 
accident or disease, and other factors. Would the next pro- 
posal then be to substitute a clinical or judgmental combination 
of all the various factors for the purpose of arranging indi- 
viduals in order of size? It is more likely that investigators 
would suspect size to be a variable which cannot be cvaluateil 
as such, because no continuum corresponds to the concept. 
Efforts would be directed first towards defining a set of vari 
ables which have something to do with size as popularly con¬ 
ceived and which can be observed and predictctl. A further 
step would be to Investigate the dimensionality and structure 
of the size” variables. Basic variables—primary factors—- 
would be identified in terms of which all the “part-criteria" of 
size could be studied In relation to various hereilitarv and 
environmental influences. 


Now let us reverse the analogy. It is the contention of this 
paper that occupational adjustment, like size, is a composite 
variable. Sin« many part-criteria have already been defined, 
the next step in evaluating vocational adjustment as popularly 
conceived is to investigsle its dimensionality and strnctnre. 
Unless this nen step is taken, we must work with a concept of 
occupational adjustment which brings together in a single 
rating or judgment didcrent factors which have meaning sej,- 

combine them by formula or by the exercise of clinical insight. 


6 



THE rONCEPT OF OCCUPATIONAL ADJUSTMENT 


n THE EVIDENCE 

The choice among the four proposed methods can be sub¬ 
mitted to consideration in the light of evidence. If the evi¬ 
dence shows that there is a single factor common to all 
proposed criteria which can m themselves be considered 
meaningful, the concept of vocational adjustment as a psy¬ 
chological entity will have been upheld. Whether we should 
then use a single criterion, a combination by formula, or a 
clinical estimate would be a practical problem, depending upon 
which could he shown to arrange individuals most accurately 
ill order of excellence of vocational adjustment. If, however, 
several independent factors are demonstrated, it would ob¬ 
viously be wiser to regard these as separate criterion-variables, 
which must he observed separately and thought of separately 
as goals of guidance programs. A crucial test of these various 
approaches can, therefore, be applied by the factor analysis 
of criterion data. 

Table 1 B shows the mtercorrclatlons of live criteiiun 
variables (I'ablc 1 A) used in Thorndike’s (.1) study of 
piediction of vocational success, for 175 men in the biennium 
24-25. Table 1 C shows the distribution of tetrad diiter- 
ences from these correlations. It seems likely, in view of the 
large deviations from /.ero, that one factor is not sullicicnt to 
account for the iiitercorrelations of these carefully collected 
data. 

Table 2 B shows the tctrachoric correlation coefficients (1) 
of twelve criterion and background items (Table 2 A) for 55 
female job-applicants born in 1914, whose contact with the 
Jewish Vocational Service occurred in 1937. Table 2 C gives 
the centroid factor loadings (4), Table 2 D the distribution 
of residuals after extraction of three factors. Table 2 T the 
final factor loadings, and Table 2 !■' the matrix of transfornia- 
tion by which the centroid factor loadings in I'ahlc 2 C were 
rotated to obtain the final factor loadings as given in Table 
2 E (S). Table 2 C? shows that the factors are by no means 
identical; the greatest deviation from orthogonality is less 

7 



EDUCATIONAL AND PSYCHOLOGICAL MEA.SUKr.MEXr 


than 25 degrees, and one plane is orthogonal to both tit the 
others. The faetors can be identified only tentatively, in view 
of the few subjects and the difficulty in obtaining precise in¬ 
formation about the clients of an employment agency. Table 
2 H lists the high and zero loadings for each factor. Factor I 
seems to be a reflection of the amount of work experience, 
which was not controlled; Factor II is primarily u matter of 
job-satisfaction or job-level; and Factor 111 is an employability 
factor. At any rate, it is clear that the criterion vTctors do not 
lie along a single axis. 


III. CONCLUSIONS AND DISCUSSION 

A It is my belief that the evidence warrants a tentative 
verdict m favor of the fourth approach, that of discarding the 
concept of occupational adjustment as a psychological entity 
and observing separately instead several dimensions of occu- 
pational adjustment While the force of this study may be 
diminished because it was necessary to use weak ciilcria and 
scanty data, and in particular because no clinical estimates 
could be included, it is ceitainly not invalidated by these faults. 
The burden of proof rests now upon those who would evaluate 
occupational adjustment on a linear scale. They must .show 
that the polydimensionality demonstrated in this study lellects 
the introduction of iirelevancics and that all meaningful 
criteria can be projected on a lineai scale. 

B. The present study is not sufficiently comprehemsive to 
identify clearly the dimensions of vocational adjustment. It 
seems likely, however, that job-satisfaction is closely identified 
with one factor, and ease of obtaining employment with 
another. Fruitful investigations into the nature of vocational 
adjustment can be directed towards further clarification of 
these factors and of their relationship to the potentialities, 
background, and aspirations of the individuals. 

do noj ll*'"' ™ they ate accepted as conclusive, 

Might m the coumelmg process. Recommendations to intli- 
viduals regarding their life-condnet can be made o 1^ y 


8 



THE CONCEPT OF OCCUPATTONM, AnjUSTMENT 


skilled advisers, never automatically by formula. The clinical 
method may have a place in evaluation as well as counseling; 
job satisfaction, for instance, may be better estimated than 
measured by objective techniques, and this may he true of any 
other variable which is a psycludogical entity. Perhaps, liow- 
ever, if these liiulings arc correct, counselors will be able to 
sharpen their conception of the goals which they are attempt¬ 
ing to promote for each individual. When they evaluate in¬ 
formally the results of coiinseliiig in a pariicular ease, they 
may find thiee or four sentences more informati\T than an 
over-all judgment regarding the excellenee of occuiiational 
adjustment. 

D. Finally, it cannot be argued that we are justiiied in 
projecting all criterion data on a .single axis hetause it is 
ivipojlont to have an over-all evaluation of adjustment What 
is important is to know the natiiie, the character of each 
individual's vocational adjustinent. \Vc must study the uliole 
person in action in terms of meaningful variahles of the occu¬ 
pational adjustment complex, but we need not idaee him on a 
meaningless scale. 

IV. .SUMMARY 

A. It was shown, by logical analysis of various iu).s.siblc 
methods of evaluating excellence of occupational adjustment, 
including clinical judgments, that they all assume such estimates 
or judgments to form a linear contiiuium. 

B. Evidence was presented that only polyilimensional 
models can represent adeijuatcly two sets of data, one from 
Thorndike’s major study of vocational succe.ss, one derived 
from clients of the Jewish Vocational Service, 

C. Without tiuestioning the elllcacy of clinical insight in 
the guidance process, it was suggested that I he concept of occu¬ 
pational adjustment as a p.sychological entity sluuiUI he sup¬ 
planted by a concept of occupational ailjusliiicnt a.s a complex 
of factors which must be observed separately and can be coti- 
sideicd separately as goals of guidance programs. 

D. Further studies should be directed towards tlie more 
precise identification of these factors, d'his fir.st tentative 


9 



EDUCATIONAL AND PSYCHOLOGICAL MHASirRLMlvN'l 


identification would suggest that one has to do with job-satis¬ 
faction and one with ease of obtaining employment, in adilitinn 
to others not as yet identified. 


TABLE 1* 

Data fiom Thorndike's study 
175 men in biennium 1924-1925 



1 A. Variables 

1 C. Tetrad differences*** 



Absolute 


1. 

Earnings 

Value 

Fipquency 

2. 

Level of job 

00 

1 

3. 

Inteiest in job 

01 


4. 

Time unemployed 

02 

0 

5. 

Change of employei 

03 

1 



04 

1 



05 

0 

IB. Intercoii elations*'' 

06 

2 

Var. 

1 2 3 4 5 

07 

3 

1 

45 21 -60 -34 

08 

0 

2 

32 -14 -18 

09 

0 

3 

-06 -15 

10 

0 

4 

40 

11 

1 

5 


12 

1 



13 

1 



14 

0 

The standard eriois of these co¬ 

15 

0 

efficients range from .03 to .07. 

16 

2 


points have been omitted. 

inteicorrelations. Loige for supplying me with this table of 

been'Zid'ed"' identical except for sign has 


10 



Tllli CONCEPT OF OCCUPAnONAI, Anjl'snU'Xl 


TAKLI-: 2‘ 

Diita from JVS & EC cliotU^ 

55 wcimcii born in 1914, data as of 19,17 
2 A. I’aittibics 


1. 'Time of leavinj; full-lime school; 191,1 m I,net uatmled a, tuituiN 
(41% lecoitk'd us minus) 

2. Number of months einphnetl to 19,17; .19 oi les>, reuiiilisl as niimis 

(Cl%) 

3. Minimum ueeklv uaftes on jobs leiioiieti ; jllj oi less, mimis 

(5H%) 

4. Maximum vveekl) rvumss lepouetl; $15 oi less, mimis ( I2',t ) 

5. Numhei ol diheient eiuplmei's lepoited ; 2 oi less, minus (49't 1 

6. Minimum uune stated to be aeeept.ible, le-s ibuii .SI*', immi. 

(29%) 

7. ■Salisfaetioii with Uiines; will not lorisidci \v:ii:e lowei iliun pietmus 
maximum, minu.s (.15%/) 

8. Waiting time foi liist job, iinemplojed until }e.n lollowiiiii time 
of leavinj; sicbool, minus (39',i ) 

9. Success of JVS eltorts to place in job; no placement, minus (ti.l'i ) 

10. College or speciali/.ed liaining beyoml 4 \eai bigii school; none, 
minus (37%) 

11 Satisfaction with type of woik done; seeking otlun tvpe of work, 
minus (13%) 

12. Freedom from recorded handicap, ineUuling speech defect, language 
handic.ip, dcfoimity, etc.; handicap pie.senl, minus (13(,e) 


*Decimal points have been omitted. 

u 



educational and TSVCirOLOGICAI. MnA.srRI-Ml.NI 
TABLE 2 (Continued) 


2B. Tetrachoiic cuitrlalinn 


Var 1 

2 

3 

4 

5 

6 

7 

8 

n 

1(1 

n 

12 

1 

69 

-13 

34 

00 

48 

-12 

-52 

T(i7 “ 

-57 

“ 25 “ 

■'if 

2 

56 

62 

-18 

58 

08 

00 

-07 

(17 

64 

18 

3 



34 

-16 

35 

-24 

-12 

-23 

Jr. 

S') 

10 

4 




01 

59 

63 

07 

-10 

13 

85 

09 

5 





21 

07 

43 

32 

10 

12 

80 

6 






-•11 

16 

no 

-(K) 

12 

57 

7 







40 

no 

20 

70 

60 

8 








20 

31 

0() 

65 

9 









40 

-StI 

10 

10 










-10 

-50 

11 











47 

12 













The standaid errors of these correlations range uir to about .15. 


2C Centroid factor loadings 2D. Disiiihiiliiui rif ithsalittr 
I II III Communality values of third fnctoi residuals 


Vai 





Absolute 




1 

66 

-22 

-18 

52 

Value 



Eieiiurnev 

2 

73 

36 

14 

68 

00-04 



11 

3 

53 

35 

32 

50 

05-09 



1.1 

4 

49 

60 

38 

75 

10-14 



10 

5 

-18 

54 

-60 

68 

15-19 



8 

6 

63 

33 

-43 

69 

20-24 



1.5 

7 

-25 

42 

57 

56 

25-29 



4 

8 

-30 

72 

-31 

71 

30-34 



5 

9 

-43 

22 

-27 

30 

35-39 



0 

10 

-44 

49 

37 

57 

40-44 



2 

11 

62 

62 

49 

1.00 





12 

32 

43 

-86 

102 






2E. 

Final 

factor loadings 

2F. Matrix of transformatiun 



I 

II 

III 


(A.) 


Var. 





I 

II 

III 


1 

53 

-22 

-10 

I 

930 

100 

-1.56 


2 

82 

45 

00 

II 

368 

812 

618 


3 

63 

52 

-11 

III 

031 

576 

-770 


4 

69 

76 

00 






5 

01 

07 

82 






6 

69 

08 

44 






7 

-06 

64 

-14 

2 G. Direction 

Cosines 

of normals 


8 

-02 

38 

73 


(A' 



9 

-33 

-02 

41 

I 

II 


III 


10 

11 

-22 

82 

57 

85 

09 

-09 

I 

II 

410 


T)58" 

(143 


12 

43 

-11 

88 

III 




12 



THE CONCEPT OF OCCUPATIONAE ADJUSTMENT 


TABLE 2 (Continued) 

2 H. Tentative ulentificatinn of factors 

Factor I. 

A. Vai iablcs with hmdings over 40 

2. Number of montlis cmploved. , , . , B2 

11. Satisfaction m itli tvpe of w<iik . 82 

6. Minimum wiijte .iccppiable. . ()b 

4. Maximum weekly wants. (I'l 

3. Minimiitn weekly wants. . fi2 

1. Time of leavinn .sclitiol. . 33 

12. Freedom fiom handicap. . . 43 

B. Varialiles with loailitii/s under / a 

5. Number of (lilfeicnt cmpltners. , ... 01 

8. Waitinn time for fir.st job. . . 02 

7. Sati.sfaction with wages.... . . . ~06 


C. Tentative identification 
Experience 
Factor II. 

A. Vat inbles luitli loaditt/ts ovei M 


11. Satisfaction with tjpc of vKtik . ... S3 

4, Maximum weekly wages . , , . . 7(i 

7. Satisfaction with wages. , . f>4 

10. Special tiaining. . . .3" 

3. Minimum weekly wages. .... , 32 

2. Nuinbei of months employed. . 43 

H. Vai tallies with londittnt under /3 

9. JVS placements. . (12 

.3. Number of different emplojeis. , 07 

(). Minimum witge atceiitable.. . 08 

12. Freedom fiom handicap. .-II 


C. Tentative identification 
Job level or job satisfaction 
Factor III. 

A. Vaiiahlcs with loadings over 40 


12. Freedom fuim handicap. KS 

.3. Number of diffeient employeis, . 82 

8. Waiting time foi first job. .... 73 

6. Minimum wage acceptable... , 44 

9. JVS phicements. . ... 41 

B Variables with loadings under IS 

2. Number of months employed. (Id 

4. Maximum weekly xeages. 00 

10. Special training. . . . 09 

11. Satisfaction with type tif woik . ~09 

1. Time of leaving school. -10 

3. Minimum weekly xvages.-11 

7. Satisfaction with wages.. --H 

C Tentative identification 

Ease in finding employment 

13 











educational and psychological mkasurlment 

REFERENCES 

1. Chesire, L., Saffir, M., and Thurstone, L. L. Gomptiliny Diayiams. 
for the Tetiachoric Correlation Coefficient, Chicago: I iitvcr^itv of 
Chicago Press, 1933. 

2. Stott, M. B, “Occupational Suc-ccss.” Occupational PsyrhtAogy 
(London). XIII (1939). 126-140. 

3 Thorndike, E. L., et al. Piediction of Vocational Hurt ess. New 
York: The Commonwealth Fund, 1934. 

4. Thurstone, L. L. The Vectors of Mind. Chicago: University of 
Chicago Press, 1935. 

5. Thurstone, L. L. “A New Rotational Method in Factor Analy.siv" 
Psychojoeiiikajlll (1938). 199-218. 

6 Viteles, M. S. Industrial Psychology New York: W. W. Norton 
and Co., 1932 

7. Viteles, M. S. “A Dynamic Cntciion.’' Cr: XIV (lO.Ri), 

962-967. 

8. Williamson, E. G. and Bordin, E. S. “Tlic Evaluation ol Voca¬ 
tional and Educational Counseling: A Critique of llie Methodology 
of Experiments." Educational and Psychological Mcasutement, 1 
(1941), 5-24. 

9 Williamson, E G and Bordin, E S. “A Statistical Evaluation of 
Clinical Counseling.” Educational and Psychological Measurement, 
I (1941), 117-132 


14 



COMPLETELY WPTCiHTED VERSUS 
UNWEIGIITJ-D SCOKINC; IN AN 
ACHIEVEMENT PIXAMINyVnON 


j. p, gitilfori), ('onstanc'I' i.ovkij,, and RrTii \f. wii.liams 

UniverHliy of Southern California 

I N A PREVIOUS REPORT/ the seninr author presented 
the derivation of a scoring weij^ht for dillerential weight¬ 
ing of responses to test items, The formula for the weight, 
in a form recommended for practice, is as follows: 

P,iP i 

Wr.4-1-, 

Pd 

in which //' = the scoring weight, 

/i„ = the proportion of an upper (or otherwise specilied 
criterion sub-group) reacting in a dclincd manner, 
p( = the similar proportion in a lower fdilferent) cri¬ 
terion sub-group, 

p — the proportion of the two sub-groups combined re¬ 
sponding in this manner, and 
q = \—p. 

Such a weight has heretofore been most widely em¬ 
ployed in connection with peisonality tests of the type of the 
Strong Focatiannl lutcrcsl Blanks. In the following study the 
weight was used in connection with an objectively-scored mul- 
liple-choicc achievement examination. In this kind of test W'C 
can consider the probability of a specified response being 
made (p) or not being made (q), for a group as a whole, and 

/J. P Guilford. "A Simple ScoiinK Weiglit for Tent Items and Its Re¬ 
liability,” Psyc/io?nelrika, VI (1941), 367-374. 

15 



educational and I’SYCIIOLOOICAL MI'ASI'KI- MEN I 


also the probabilities of the same response beiriR miule within 
two separated groups. Our main problem was to determine 
whether an examination with completely weighteil scoring of 
this kind yields any more highly reliable and valid .scores than 
the same examination yields with unweighted scoring. A sub¬ 
sidiary problem was to determine whether the length of ex¬ 
amination has any bearing upon the effect of weighted versus 
unweighted scoring. By “completely weighted'’ we mean that 
ever'j response, whether considered right or wrong, is given 
a weight in proportion to its predictive significance This pro¬ 
cedure is in contrast to ordinary differential weighting where 
only correct responses are weighted in proportion to their 
diagnostic value. By “unweighted scoring’’ we mean tliat items 
are given weights of 0 or 1 — 0 if a wrung response of any 
kind Is given and 1 if the correct response is given. 

As our experimental material wc used the results of a final 
examination in a course on "Problems of Human Behavior.’’- 
The test was composed of 201 multiple-choice items and 11)7 
true-false items. These items had been iinaly/eil for validity 
in previous use; tlierefoie, wc could expect to find an unusually 
large number of diagnostic items both when scoring was 
weighted and when it was not. Instructions to guess or not 
to guess were not stated on the test blanks, but the usual cor¬ 
rection formula had been applied to allow for guessing. I'liere 
were extremely few omissions. These facts arc mentioned be¬ 
cause we used the total examination score as our criterion of 
achievement in the course. 

Our study was confined to the first 100 consecutive items 
in the examination, skipping one item which had an unusual 
number of omissions and which had been excluded from the 
scoring of the total examination. All but 8 of the items proved 
to have phi coefficients of .14 or larger (in other words, sig¬ 
nificant conelations)^ and all but 19 had very significant phi 


raaterlaf ‘"‘lebtecl to Dr Neil Wanen for the opporlunity to uic thin 

16 



WEIGHTED VERSUS UNWEIGHTED SCORING 


coefficients (greater than .18). The total range of phi coef¬ 
ficients for the 100 items was from -06 to -r.48, with a 
median of + .28. 

Three hundred test papers, selected at random, were used 
in this investigation. The papers were re-.senred in order to 
be absolutely sure about total, or criterion, scores. The upper 
and lower criterion sub-groups were composed of ihe 100 high¬ 
est and the 100 lowest ranking students in the list of 300. 
The proportion of each group responding in each of the four 
ways to every item was determined. The scoring weight for 
each particular response was read from a graphic chart.'* 
For example, one item read: “Genius ami fecble-mindednesis 
are (1) points on a normal distribution curve; (2) points on 
a bimodal distribution; (3) points on a multimodal disiiilni- 
tion; (4) in separate distributions." The scoring weights foi 
responses 1 to 4 inclusive were: 6, 3, 4, and 2, re,spectively. 

A logical reason for expecting improved reliability and 
validity from weighted scoring is now nioi e ap[iarent. Not all 
the wrong responses are of equal diagnostic value, an nssuinp 
tion that is implicitly made in unweigliteil scoring. It is appar- 
ently worse for a student to err by choosing answer 4 and 
less serious for him to choose answer 3, In only I.*! items out 
of the 100 did tlic three wrong answers prove to have equal 
weight. No weights exceeded 6 points nor fell below 2 point.s 
on a scale that extended from 0 to 8. 'I’liis range was to he 
expected from the moderate and small sixes of the phi coelli- 
cients found for correct responses, as previously mentioned. 

The factor of length of test was investigated roughly hy 
selecting two shorter examinations composed of the first 2(1 
and the first 50 items out of the 100. Each of the three tests 
of different length was scored in both halves (odds and evens) 
and in total, with and without scoring weights. For tins pur¬ 
pose, 100 papers were selected out of the original list of 300, 
by taking every third paper when the 30(1 were in rank order 
for total scores. The correlations to be mentioned next are 
based upon these 100 papers, 

*Sce reference in footnote 1. 


17 



EDUCATIONAL AND I’SYC’IIOI.OCICAI. Ml A‘'I"RI-Ml,N f 


TABLE 1 

RiiLiAniLiry and VAunirv cni-rEicii X’is 


Relialiilily Valitinj 

Weighted Ciiwcighlcd Weighted T'nwrighted 
Length of Test Scoring Seining Senriiig Srniiiig 

20 items .' .667^ .(>40 ..HI7 .TTl" 

50 items .860 .844 ..SMJ /Wl 

100 items .922 .899 .9()() .924 


The reliability coefficients were estiiiiiitcd by the S[)earm!in- 
Brown formula in each case. The correlation nf each .short 
test with the total (criterion) score was coinputcil. The re¬ 
liability and validity coefficients are summarized in 'Fable 1. 
Here it is obvious that the weighted scoring yieltlcii a scant 
average gain of 02 in the reliability coefficients. Tlii.s trifling 
gain is consistent, but seems to be insignilicunt. In validity 
the weighted scoring yielded a gain of about .02 m the shortest 
test and a like amount of loss in the longest test, neiiher of 
these changes being significant. 

It is well to consider possible special rea.sons for the fail¬ 
ure to obtain increased reliability and validity here. As indi¬ 
cated above, the phi coefficients between items anil criterion 


scores were generally low and the range of difierential weights 
was relatively small. It might be that in other examinations 
which include items with weights extending closer to t) and 8 
there would be an appreciable gain from differential weighting. 
On the other hand, 32 of our 100 items had weights ranging 
from 2 to 6, 21 more had weights ranging from 2 to 5 or 
from 3 to 6; and 45 had weights ranging from 3 to 5, to be 
contrasted with an unweighted range of 0 to 1, 

One important source of gain to be expected from com¬ 
plete differential weighting comes from the variations in 
weights among the wrong responses, which in unweighted scor¬ 
ing or in partially weighted scoring arc all given the same 
value of zero. Gaps of moie than one point between the 
correct response on the one hand and all the wrong responses 
on the other simply magnify numerically the variability among 

individuals total scores exeenr rLii r 

are Ml , the more diagnostic items 

then allowed to contribute relatively more to the total 


18 





WEIGIITKD VERSUS UNWEICUTEn SCORING 


variability than do the less diagnostic ones. When there is a 
spread of the weights among the wrong responses, there shoiild 
be more refinement in providing eftectual variability among 
total scores. The weights for -wrong responses among our 
100 items showed very narrow ranges. 15 items had equal 
weights for wrong responses, 68 had differences not exceeding 
one, and 17 had differences not exceeding two points. While 
these differences arc small, it would seem that their effects 
should have been felt in scoring 

Another possible factor in the failure of Aveighting is that 
the criterion scores were derived from unweighted scoring. 
Correlations of either weighted or unweighted scores for the 
shorter tests of 20, 50, and 100 items and total scfire.s ajc in a 
sense spurious in that wc arc correlating part with whole This 
factor might have favored higher correlations, e.s[)ccially foi 
the unweighted scoring of the parts 

In using total scores as our critcrum of achievement here, 
we have assumed up to this point that, though the ]uirt-whole 
correlations are spurious, they are equally so for both types of 
scoring. One evidence that this assumption may not be sound 
is the fact that the shorter the part-test, tile relatively greater 
is the advantage of weighted over unweighted scoring. (vSee 
Table 1) On the other hand, it may he characteristic of short 
tests per se to gain relatively more from weighted scoring. 
Other evidence is found among the reliability cocfliclcnts. Here 
we have unweighted scoring correlated with unweighted scor¬ 
ing and weighted scoring correlated with weighted scoring. 
The correlations indicate that regardless of the length of the 
test within the range of 20 to 100 items, there is about the 
same slight advantage for weighted scoring. Although changes 
in validity do not always parallel changes in reliability, it 
would seem that if the evidence of reliability here is depen¬ 
dable, the systematic variations in validity may he due to the 
relatively greater spuriousness for unweighted scoring in the 
longer tests (since they are greater parts of the total and arc 
similarly scored) rather than to any greater relative gain in 
validity of short tests by weighted scoring. 7'hc evidence is 

19 



educational and psychological mkasitrkmknt 


entirely too meager, however, for us to draw any final con¬ 
clusions on this point. 


In view of the uncertainty introduced by the factor of 
spuriousness of correlation, it would have been interesting to 
see what would have happened with an outside criterion. Some 
idea of the extent of spuriousness can be obtained, without 
taking the trouble to score the tests minus the 20, or 50, or 
100 items used in the experimental tests, by applying a for¬ 
mula to estimate the amount of correlation between part and 
the whole from which the effects of the part arc eliminated.''' 
For the SO-item test, with unweighted scoring, for example, 
this estimated correlation is 862, which may be taken as an 
indication of the correlation between the SO-item test and an 
outside criterion of about 250 items. The correlation of the 
same test with a homogeneous test of 300 items (the approxi¬ 
mate length of the total examination) is estimated by formula 
to be .865.'’ The amount of spuriousness is then indicated 
by the difference between .865 and .901, We cannot similarly 
estimate the amount of spuriousness in the correlation of the 
weighted scoring in the 50-iiem test since it is not a simple 
part-whole relationship. But had the spuriousiiess in this case 
been zero, the validity coefficient of .892 is not quite ,03 higher 
than that of the estimated unweighted scoring without its 
spurious element. It is doubtful, therefore, whether the hy¬ 
pothesis of greater part-whole spuriousness attributed to the 
unweighted scoring is sufficient to account for the failure of 
weighted scoring to exhibit superior validity coefficients 


_ Had aU the items in our tests been significantly correlated 
with the criterion, a difference in favor of weighted scoring 
might have lesulted Therefoie, we selected for comparison 
effinV i- ' diagnostic value. The reliability co- 

_respectively, and the corresponding validity coefficients 


Co4ny 1«6)!(New York: McGraw H ill Book 


20 



WRIGIITED VERSUS irNWEICUTEU SCORING 


were 900 and .904. These cneflicients wcie insigmlicantly 
better than the ones derived from the 50-item tc.st in which 
the items were taken at random. I’lie inference might be 
that the items used in all our experimental tests were at a .suf¬ 
ficiently high level of diagnostic value, taking them collectively, 
that weighted scoring was of no consetiiience. 

Our general conclusion is that our logically defen.sible sys¬ 
tem of completely weighted scoring did not yield an .ippce- 
ciable gain in either reliability or validity in achievement e\ 
aminations of from 20 to UK) items. While the re.sult is nega¬ 
tive so far as the improvement o) test technique is coiKCrned, 
it is useful to know that the customary unweighted scoring, 
which takes distinctly less time and effoit, gives about as rcli 
able and valid results as differential weights allord." .\lthough 
this result may not be gcncruli/cd to all weighting methoils 
and to all kinds of tests, it lioes suggest the possibility of sat¬ 
isfactory scoring without weighting in places where wc now 
attempt to extract the utmost validity by llic use of diltercii. 
tial weights. With the incieastnl use of machine scoring, 
where differential weighting becomes u serious practical prob- 
lem, It may be well in any case to consider the eflieicney of 
weights 0 and 1 before rocoininending a system of diltcrenlial 
scoring weights, 


’This gencial ouicume is in line with a cimclusion reachcil <m rational 
grounds by M, W. Rich.y(lsnn, in Paul Hmst's 2'lif Prtdulion of Personal 
adpistvunt (New Yoik; Social Science Research Council, 194iy 379-4ttI, 

21 




A PRELIMINARY STUDY OF THE RELATION OF 
MEASURED INTEREST PATl'ERNS AND 
OCCUPATIONAL DISSA'I'ISFACFION' 

TIIKOnORU R, SARlllN 
Umveriity of Minnesota 

and 

IIRDWIN C. ANnKRSON 
Minnesota Division of Vocational Rcliabilitation 

T hat occupational dissatisfaction is associated with a 
lack of interest typical of success)iil men in a pariiculai 
job IS a genet ally accepteil hypothesis, This is of special con¬ 
cern to psychologists, piuticulurly if the hy()othesis can be 
verified and predictions ol occupational satisfaction iiiade on 
the basis of interest measurctnent, In mder to test the hypoth¬ 
esis two kinds of data must he analy/.ed: (1) evidence of 
occupational dissatisfaction and (2) measures of vocational 
interest, 

Although rating scales for deiennining job satisfaction 
have been developed by Hoppock (+) and others, they are 
difficult to use in a clinic where the clients or patlcius form a 
heterogeneous population. They come from many dilferent 
walks of life, and there arc seldom more than a few indi¬ 
viduals who are employed by the same organisation. Job 
satisfaction can be described by Hoppock’s definition as "any 
combination of psychological, physiological or environmental 
circumstances that causes a person truthfully to say ‘I am 
satisfied with my job.’ ’’ (4:47) 

^This .siucly IS one of a scries of studies in process on clinical problems of 
inieiesf measiiremem' at die University of Minnesota Tesiiii)! Bureau, 

23 



EDUCATIONAL AND I'SYCUf H.rjtDCAl. MI AM RIMINI 


In commercial and industrial organi/.itiims, 4 psytludogist 
may experience difficulty in persuading w«jrkers '‘truthfully'' 
to state their feelings about their work because they arc afraid 
of losing their jobs. Duting periods of widespreati uneinploy- 
ment, especially, an individual may express salisfattion with 
his job merely because it is a job. By dealing with groups of 
workers and guaranteeing anonymity by the use of unsigned 
questionnaires, a psychologist may gather gioup data, hut the 
anonymity may prevent his relating these data to such variables 
as personality traits or interests in suhseijuenl clinical study. 

The clinical situation in whidi the present data were gath¬ 
ered gives greater assurance of meeting Iloppock's qualifica¬ 
tion regarding the truthfulness of the clients' responses. In the 
first place, all the subjects came to the University rjf Minnesota 
Testing Bureau voluntarily. They had heard of the Bureau 
through friends or business associates. They reiognl/ed the 
Bureau as a disinterested organization which each year assists 
a small number of out-of-school adults with prohlein.s of voca- 
tional adjustment. Secondly, they paid a special fee for the 
service. This fact presumably predisposed them to tell the 
truth about their occupational experiences. Finally, if a client 
had difficulty in expressing himself, a trained clinical inter¬ 
viewer assisted him to say the things that he could not or 
would not otherwise have said. It is reasonable to assume from 
these three facts that expressions of occupational dissatisfac¬ 
tion were truthful expressions. Having found a usable index 
of occupational satisfaction, the next step was to find a measure 
of vocational interest. 

According to a recent poll (1), the most widely used 
measure of vocational interest is the Slronej Focaiiotlal In¬ 
terest Blank. This instrument is based upon this fundamental 
assumption; 

“If a man likes to do the things which men like who are 
successful in a given occupation and dislikes to do the things 
which these same men dislike to do, he will feci at home in 
occupationa environment. Seemingly, also, he should be 

24 



MEASURED INl'EREST PATTERNS 


more effective there than somewhere else because he will he 
engaged, in the main, in work he likes ” (6) 

The Slrong Vocaltonal Interest Blank was stamlardi'/.ed 
upon people who were purportedly successful in their occupa¬ 
tions. Strong's criteria of occupational success include the fol¬ 
lowing: length of experience m an occupation, annual income, 
level of education, certification of membershiii in iirofc.s.sional 
society, and selection by so-called competent authorities. These 
were used singly or in various combinations. 

This is not the place to list the descri|ition ol Strong's 
criterion groups but as an illustration we take three occupa¬ 
tions. The samples of successful men have all been engaged 
m the respective occupations for at least the three previous 
years, and none is over 60 years of age. 

Accountanf “Includes 160 general accountants, .s4 cost 
accountants, 65 auditors, and 66 coniptndlci s and treasurers. 
Average age equals 37.4 years; education equals 13.3 giade.'' 

Office Worker: “Includes 214 ollicc clciks, bookkcepeis 
and stenographeis; 92 office manageis; and 21)0 crmlit man 
agers. Average age equals 33.2 years; education eijuals 11.5 
grade." 

Physician: “(fiaduates of Yale and Stanford Medical 
School. Includes 252 physicians and 75 surgeons (no dilicr- 
ence of interest between them) 253 are from California, 47 
from Connecticut and 9 from New York; the reinaining are 
scattered Average age equals 40.9 years; education ciiuuls 
18.5 grade." 

In interpreting the results on the Strong racational In¬ 
terest Blank, then, it is always necessary to think of the 
criterion groups which served as the norm, as well as the 
percentages of the group included under each grade. Thus if 
an individual scored A on the key for physicians it means that 
he made a score in the range of the'top 69 per cent of the 
physicians who made up the norm group, A score of B falls 
in the range of the next 29 per cent; and a score of C f.dls in 
the range of the lowest two per cent of the criterion group 
(3). 

For purposes of the present analysis, 100 cases were 


25 



KDIJCATIOXAI- AXn rSM’IInKK.H \! Mi 'i**' Kt \n \ 1' 


selected from the files of the University of Minnesofu IVsting 
Bureau for the period 1937 to 1941) on the li.isis of uimplete* 
ness of data. This sample contained 76 men ami 24 uomen. 
The cases were so-called "noii-collcfrc adults", individuals who 
are accepted by the University 'resting; Bure.m for research 
and clinical purposes. Those who had Kross physical abnor¬ 
malities, such as paralysis, spasticity, and deafness were not 
included in this selection of cases. (July indivitluals who were 
2S years of age or more were included. Above this limiting 
age individuals usually have had some opportunity to e.stablish 
a work history. The mean age for men was .11.3 witli a 
standard deviation of 5.7 and a range of 25'.5,1; for women, 
30.4 with a standard deviation of 7.7 and a range ot 25 44. 

The educational level of this group njvpear.s to he higher 
than that of the general population, h’or men, the mean grade 
completed was 13 8, S.D. 2.5; for women, 14..5, S.D. 2.3. 

The occupational status of this group was also higher than 
that of the general population. According to the MiuitruiUi 
Occupational Rating Scales, 72 per cent rated in the top three 
categories. In the general population only 22 per cent fall 
into these three categories," 

From an analysis of these data, the following liypotliesis 
can be tested; 

Adults who express dissatisfaction with their current 
occupations show no primary pattern of interest, as measured 
by the Strong Vocational Interest Blank, for the group of 
occupations in which their current occupation belongs. 

The Strong Vocational Interest Blank was first analyzed 
in order to determine the primary pattern of interest. Dartey’s 


^Occupational Class I professional; Occupational Class II; spini-profes- 
sional and managerial, Occupational Class HI; clerical, skilled trades, retail 

mil v/u Fand J E. Ander.son, Expmmental VhUd Ptytlwlm, 
INew Votk Appleton-Centiuy, 1931), pp. 501-12. 

recent^ncVnlLl"f’"r‘' concerned with present occupatiun (ur iium 

wle avadab c on o/cd "t Uie time of counselmB), Data 

rccunatron kMr on C''i''cidad wilh present 

terms of the modal occupltml in 


26 



MEAStTRED INTEREST I’ATTERNS 


scheme of determining the presence and intensity of patterns 
of interest on the Strong Blank was utilized (3). A primaly 
pattern is defined as a preponderance of A and B • scores 
within the occupations making up a group of factois as re¬ 
vealed by existing factor analysis studies. To illustrate: tlie 
verbal or linguistic interest type, (Iroup X on the vStiong test, 
is made up of the following typical occupational titles- adver¬ 
tising man, editor, lawyer. If a client had scores of A, B j , A 
respectively, on these three keys, he would be considered to 
have a primary pattern of inteiest m this gioup of occupations. 
If his scores were B-h, B, B, he would he rated as having a 
secondary pattern of interests. A tertiary pattern of interests 
is defined as a majority of B and B— scores on the keys nithin 
any factor or group. In the present study the number of cases 
was too small to be treated in terms of Darley’s fourfold 
classification, primary, secondary, tertiary, and no pattern. 
Instead, we considered only two categories: 

(a) Presence of primaiy pattern (this means presence of 
primaiy pattern as defined above in the group whieh 
embraces the client’s present or most recent occupa¬ 
tion) . 

c.g. Client’s present occupation; .■Kiitomohile Sales 
man; 

Scores on Strong Blank Ciroup IX 
Real Mstate Salesman B ■ 

Life Insurance Salesman A 

Sales Manager A 

(b) Absence of primary pattern fthis means alisciice t)f 
primary pattern in the group which embraces the 
client’s present or most recent occupation) 

e.g. Client’s present occupation : Lawyer: 

Scores on Strong Blank Cthuii) ^ 

Advertising man B-—- 

Lawyer C 

Lditor C 

Each case was classified according to the dlent'.s stated 
complaint. The following categories were used: 

27 



EDUCATIONAL AND I'SYCHULCKJirAI. Ml'ASt'RI ^it N r 


(a) Dissatisfied with occupational field. 

(b) Dissatisfied with present job. 

(c) Dissatisfied with present job onh bctause of future 
prospects. 

(d) No specifically stated dissatisfaction, but '>ecks voca¬ 
tional and/or educational informatiftn nr advice. 


Each case was also classified according to the dinicinn'.s 
diagnosis, using five broad classification.s: 


(a) Inappropriate vocational choia': c.g., “He does not 
have the interests of salesmen.“ “Ila.s never been 
interested in mechanical work.*’ “Working with 
teacheis not congenial to this man's values, attitudes, 
and ideals ’’ 

(b) Primary peisoiialtly disonlcrs: e.g., “.social malad¬ 
justment”, “unhappily married”, “neurotic tenden¬ 
cies.” 

(c) Insufficient education oi irainimj: c.g.. “lacks steno¬ 
graphic skills to compete with co-workers", "lacks 
sufficient graduate training to get into junior college 
teaching", “lacks skills in co.st estiinniing which arc 
required for promotion and increased pay.” 

(d) Inappropriate job placement: e.g., “clerical skills not 
being used_, ‘stenogiaphic skills are not adctiu.itc”, 

triick-driving satisfactory, but woulil be happier if 
he had a run closer to home and family", ".selling 
satisfactory, but his product is inappropriate." 

(e) Other: This includes a small number of which three 
were characterized as "no problem”, the rest as 
hnancial, health, or unclassified. 


These diapostic illustrations arc stated as single entities. 
Ihis IS Mmewhat misleading. For many of the cases a multiple 
diagnosis was made. For example, 28 per cent of the cases of 
tnappropriate vocational choice also exhibited neurotic .symp- 
oms and mild personality disorders. The data, however, were 
a e in erras of the diagnosis that was considered by the 
clinician to be the most significant one. 


A third kind of classification was made to determine the 

28 



MEASURED INTEREST PATTERNS 


frequency of types of treatment or recommendations. These 
fall under the following headings; 

(a) Placement advice, e.g., ‘‘Since your interests and 
abilities fit the picture of successful salespeople, 1 
would recommend that you register with the X and Y 
employment agencies.” ‘‘You should seek employ¬ 
ment in a more technical field than your present occu¬ 
pation.” ‘‘You are faced with two alternatives: 
taking over your father's business, or continuing as 
an engineer. Your interests and personality traits 
would suggest that you would be happier as an engi¬ 
neer than as a business man.” 

(b) Additional Irainiiiff recoin mended: e.g., “A Univei- 
sity Extension Course in Cost I'istimating seems in¬ 
dicated.” ‘‘In ordci to prepare for the position in 
mind, you will have to return to college for two years 
of graduate work” ‘‘In order to capilali/e on your 
assets and interest, you should obtain the necessary 
skills at such a school as The Blank Industrial 'I'rain- 
ing Institute.” 

(c) Psychotherapy: e.g., recreational therapy, catharsis, 
helping client to gain insight into family or other 
conflict situation, suggestive therapy, group therapy, 
relationship therapy, and so on. 

(d) Referral to psyclilaitist • Obvious psychiatric prob¬ 
lems. 

(e) No advice oi tecommendaiions. 

The results of the analyses arc summarized in the three 
tables. Table 1 shows the clinicians’ diagnoses for 7(5 male 
clients and how they are related to the presence or absence of 
primary patterns of interest and also to clients’ complaints. 
Table 2 shows the same data for 24 female clients, Tabic 3 
summarizes the treatment techniques as related to diagnoses 
for the 100 cases. 

Table 1 reveals one fact quite clearly: most adult males 
who complain of occupational dissatisfaction show no primary 
pattern of Interest In the group of occupations which embraces 
their present occupation, Sixty-two of the 76 men (82 per 

29 



EDUCATIONAt, AND PSVC'IUH.fKiirAf. Mi AM Ki Ml NT 


TABl.i: 1 

clients’ statement or prodipm and nis'iriVN^i' Jv 

FRIMARV PATTFRN OF lVTrRF*r IS tl'BRfHT ?’US'*V 


(N-yft Men I 


Clinician's 

Diagnosis 

liin|)prci 

pnate 

vncatmnal 

clinic^ 

Pflnwfjr 

prr^irin ilily 
dinonlrr« 

In 

S*"! \ 

iMiniflfi 

1" ii'I ! 

! "m ' ,JM “ 


T 

Client's 

PHlXtAHV 

PATTFaM 

TRtMAa^ 
PATTI RK 

l«1V9HKt 

TSi^VUiT^ 

tsf’i 
fiiifi rs** 

III 

Statement 

Pres- Ah* 
cut Kent 

I’ren Alt' 
cut Krnt 

!Vc 4 Ah 
ent -rut 

Trr- V 
rjjl 

1V^ A', 

t Hi »f n) 

Pra 

fj 

Dissatisfaction with ocupa* 
tional jield .... 

_ fl _ 

3 u 

ft 1 

i J 

,<1 g 

L 

Dissatisfaction with speciiic job 

0 3 

3 n 

1 H 

M 9 

‘1* H 

4 

Dissatisfaction with future of 
job. 

_ft _ n 

_ i\ 1 

1 M 

Kl T 


-! 

No specifically stated dissatis* 
faction .... 

2 12 

2 7 

0 2 

J '1 

1 


Totals. 

2 33 

R U 

2 I 


! : 

u 



TA 111 . 1 ; 2 




clients’ statement of PROni.EM AND 

Cl.miCIANS’ 

IIIAfiNOAF-' IN 

FIRMS Id 

!‘RF4VU 

<m A^' 


PRIMARY PATTERN OP INTEREST IN CI'RRiKr m'U’l’MiON 


(N--24 \Vomcii) 


Clinician's 

lna(>pro 

I'nmarv 

Iiujt; 1 * 


“ 

Diaguosig 

pnatc 

1 n ill* 



voc.'itional 

lirrsrinaluv 

l»di 

t ijb' i 

T 


chmee 

dlHtiidrrs 

p! !• rturu' 




raiMAXY 

I'RIMAIS 


(HIM vat 

Hi 

Client's 

_ PATTERN 

PATTERN 



Mr 

Statement 

Pics- All* 

PiCH Ab 

I'rr. Ab 

I'lr* Ab 

't*r» 


cut Kent 

cril **fnl 

rni 

ml 'M'ni 

e*i 

Dissatisfaction with occupational field. 

. 0_ A 

1 n 

n <1 

It D 


Dissatisfaction with specific job 

J 

» 1 

1 1 

II 1 

1 

No specifically stated dissatisfaction 

_I_ 0 

1 4 

fl 1 

>1 1 


Totals , . . 

1 5 

4 5 

I 2 

4 

Ji 


TABLE 3 





ANALYSIS OF TREATMENT TFCHNIQUES IN TERM<I PI Fl INICIANS IMAf-MHU'i 



Placement advice 


Recommended additional tiain- 
ing ,., _ . , ,., 

Paychotherapy . . 

Referral to psyclnatriat . , 

1^0 advice , . , 

Totals. 


(N-~76 Men, 2+ Women) 

Inappro- ^ 

pnate Primary In* 
vocational personality siifFicient 
choice fllsorflers ^cdiicatirm 

Worn Worn- Worn. 

Men en Men en _Alen ctl 

JJ_ 2 4 3_T 0 

-I 3 0 ^ (I 

3 0 11 3 " 

- ° ° 3 3 o" ' [) 

^ p 1 0 0 0 

33 P 22 9 5 0 


In.iptirii 

llrulP 

job 

lilaiTinciil 

Worn 
31cii pn 

4 3 

2 n 

n _ tl 

- 1” 
-1. 

(i 3 


Oilici i* 


Wont’ 

Mph rii 

I (I JL 

0 i 'JT 

1 0 

3 J L 


30 




MEASURED INTEREST EASTERNS 


cent) fall into this category. When we consider the women 
who came to the Testing Bureau, the association is not so 
clear cut. (See Table 2) Fourteen of the 24 women clients 
(58 per cent), had no primary pattern in the occupation in 
which they were employed. When we consider the 10 women 
who actually expressed dissatisfaction with their woik (items 
1 and 2 in Table 2), we see that eight (80 per cent) had no 
primary pattern in the group of occupations which cmbracctl 
their present employment. B'hese equivocal results in the case 
of the women may be attributed to the inadequacy of the 
Strong Blank for Women, to the smaller number of cases, or 
to the generally accepted statement of sex dilterences in in¬ 
tensity and variety of interests. 

Where the men actually expressed dissatisfaction, 70 pei 
cent referred to the occupational Held rather than to the par¬ 
ticular job in which they were employed at the time of the 
interview (Items 1 and 3 versus totals of items 1-2-.1). For 
example, one junior high school teacher said: “It isn’t iny job 
at the Blank Junior High School that 1 don't like. As teaching 
jobs go, it’s a good one. It’s the idea of spending the rest of 
my life as a teacher that is my bogey." A small number did 
express dissatisfaction with their present jobs, hut considered 
the occupational field in which the job was located as at 
desirable one. Only five individuals expressed satisfaction with 
the field of the occupation and with the present job, but were 
conceined over the future prospects of the job. 

Analysis of the clinicians’ diagnoses reveals first, that the 
diagnosis inappropriaie vocational choice is the most fre¬ 
quently-made diagnosis. Of the 35 men and six women who 
were diagnosed in this way, 33 men and five women did not 
show a primary pattern of interest on the Strong Blank in 
the group of occupations which included their present occupa¬ 
tions, Reading of the case notes indicated that when the in¬ 
terest test data were out of line with the present vocation, the 
clinician almost invariably recorded the diagnosis as inappro- 

31 



educational and PSYC'HOI.OCirAL MI.ASfKI MI NI 


priate vocational choke, and was unable to find any fuiicr 
diagnostic description more appiopriate to tiic facts. 

What diagnoses aie made when the interest patlrtn tUfrecs 
with the present employment? In 24 cases S 14 men. 1(J 
women) the interest test showed a primary pattern which 
coincided with the present job of the individual. b'muTeen of 
these (6 men, 8 women) expressed no specific dissatisfaction. 
Of the 24 clients in this group, half were diagnosed as having 
piimary personality disorders. Of the remaining; 12, the 
diagnoses were about evenly distributed among the other 
categories. The following hypothesis may be formulated from 
these data: a person may have the vocational intere.sts of 
people successfully engaged in his present occupation, but 
deep-sealed personality disoiders may otherwise interfere with 
his occupational adjustment. 


Table 3 represents the types of treatment commonly 
employed by Testing Buieau clinicians. It is beyond tlte scope 
of this paper to deal with the evaluation of the various kinds 
of treatment. The treatment techniques most frequeiuly used 
were, placement advice ^ndiecoiitiiieiidcd addilmiial ti'aininy. 
These were used primarily where the diagnosis was iiitippro- 
priate vocational choice. As indicated before, these iliagnoses 
(and therefore the treatments) were based on data from the 
Stiong Vocational Interest Blank. Psychotherapy and referral 
to psychiatrist vftrei employed in most cases which were 
diagnosed as primary peisonality difficulties. 


It seems quite clear that these data allow u.s to test only 
a limited hypothesis. Actually, we are using a.s reference 
points Strong’s original ciiterion groups. The conclusions, 
therefore, can only be stated tentatively until more extensive 
samples are utilized. If we had selected 100 adults at random 
among individuals who had not come to the Testing Bureau, 
w many would have shown primary patterns of inlere.st 
w ich coincided with their present occupation ? What pro- 
portion would have been dissatisfied with their work? What 


32 



MEASURED INTEREST PATTERNS 


proportion would have adjusted to such dissatisfaction without 
the help of an outside agency? 

Partial answers to these questions are implied in a recent 
monograph by Darley (3). He says that individuals who 
continue in occupations which are at variance with their in¬ 
terest pattern may: 

"(1) Develop socially acceptable and compensatory hob¬ 
bies; 

(2) Develop personality conflicts at home or on the job, 
but still keep on the job; 

(3) Re-define the specific job duties more in line with the 

activities of the primary interest type . . ; 

(4) Establish a sufficiently poor work record to be only 
marginally employable (without promotion) or to 
be separated from the job ...” 

It is not improbable that this sampling is, in the main, 
composed of individuals who, while they may react in the 
alternative ways indicated by Darley, also seek the help of an 
available outside agency in finding an adju.stment when they 
experience dissatisfaction. 

To answer the questions raised in this discussion, crucial 
experiments must be carried out. Until such research is prose¬ 
cuted, conclusions from these data must be made with caution. 
The data seem to justify this conclusion: occupational dissatis¬ 
faction is associated with a lack of primary interest in the 
current occupation. What explanations may be offered? Two 
alternatives immediately suggest themselves: 

(1) A person’s interests are temporally stable; they are 
relatively crystallized prioi to entry into the occupa¬ 
tional world; when the occupational activities and the 
interests are at variance, dissatisfaction results. The 
dissatisfaction is a consequent or a resultant of a fixed 
personality interacting within an occupational milieu. 

(2) A person’s interests are temporally not stable; they 
arc flexible and subject to change subsequent to entry 
into the occupational world; they may change as a re¬ 
sult of lack of success, environmental factors, or more 
fundamental personality traits in interaction. 'I'hc dis¬ 
satisfaction is antecedent to, or coincident with, 
changes from a primary pattern of interests to no 

33 



EDUCATIONAL AND ESVC’!I(JI,U{;U-AI, Mf AM RI MI Xr 


primary pattein of interests In the present nccijpa. 
tional group. 

To know which of these alternative explanations is correct 
is impoitant for clinicians who arc approached for assistance 
by vocationally-dissatisfied clients, and also by clients who are 
about to make a vocational choice. If interest.s are fixed by 
the time an individual is ready to seek cinplovnwnt, and if 
dissatisfaction will result if the client enters an occupation 
outside his interest type, then the clinician will advise him to 
seek employment in certain restricted areas. If, on the other 
hand, measured interests and satisfactions are the product of 
successful achievement, then the clinician will advi.se edients to 
seek employment where the greatest possibilities for success 
are to be found m terms of the clients’ abilities and also 
employment opportunities. Extensive longitudinal studies will 
determine which of these alternatives, if eilher, i,s correct. 
According to Barley’s review of the literature, the first alter¬ 
native seems more in line with available evideiu e (.1). 


A word is in order relevant to the psychological processes 
which are represented by the Stroup I’ocafwunl Intwsl 
Blank. Cartel, in using this instrument, suggests that patterns 
of interest “become closely identified with the self." KuiTher, 
the pattern of interests is in the nature of a set of vahie.s . . 
(2). In this connection Sarbin and Berdic have demonstrated 
that certain relations exist between values as measured by the 
Allport-Vernon Scale and interests as measured by the Strong 
Blank (S) It is postulated that the summation of the 400 
preferences on the Strong Blank reveals-at least in part—a 
cross-section of what the individual would like to he; in sliort, 
a person s ideal conception of the self. I'he Freudian expres- 
Sion, ego-ideal, carries approximately the same meaning, 

Expressed occupational dissatisfaction, then, may he a re- 
sultant 0 the conflict between the ego-ideal and the occupa- 
nal milieu or reality m which the individual applies this 

T„r H , T f is such .ha. 

individual s idealistic self-conception is tested and verified, 


34 



MKASURED INTEREST RAl'l'ERNS 


no conflict or dissatisfaction ensues When leality prevents the 
testing and verification of one’s ego-ideal, we find expressions 
of occupational dissatisfaction 

This interpretation throws no further light on the pre¬ 
viously-posed problem as to which of the two alternative 
explanations is the appropriate one. d'he problem is merely 
restated in this form; is one's conception of the self (ego- 
ideal) a stable phenomenon or is it a variable one? Does it 
change with each variation in reality, with success and failure 
experiences? Further experimental work will illuminate some 
of these dark corners. 


Sum mary 

Adults who complain of occupational ilissati.sfnction show, 
in general, measured interest patterns which are not congruent 
with their present or modal occupations. If vocational in¬ 
terests are stable temporally, and if they have the dynamic 
character usually attributed to them, we may expect a high 
incidence of occupational maladjustment when individuals 
enter occupations for which they do not have the appropriate 
interests at the time of entry. 


RKFERENCICS 

1. Beane, B., Carroll, J., and Habbc, S, "Tlie Beane Poll of Favored 
P.sycholagical Tests”, Journal of Afl'iied Psycholot/v, XXIV, 
(1940), 347-352. 

2. Carter, H. D. “The Development of Voeational Attitudes”, Journal 
of Consulting Psychology, IV, (1940), 185-191. 

3. Darley, John G. Clinical Aspects and Interpretation of the Strong 
Vocational Interest Blank. New York: The Psycliologieal Cor¬ 
poration, 1941, 71 pp, 

4. Hoppock, Robert. Job Satisfaction. New York: Plarper and 
Brothers, 1935. 


35 



EDUCATIONAL AND I’SVClIOI.ONIfAI. Ml .VM-RIMLNr 


5. Saibin, Theodore R. anti Bcrdic, Ralpli, “The lirlafitm nl Meas¬ 
ured Interests to the Allport-Vcrnon Stud) of Valiiri-", Jrtunuil of 
Applied Psychology, XXIV, (IDdO). 

6. Strong, Edward K., Jr. I\Ianual fur I'otviional InfrrcH Blank for 
Men. Stanford University Press, 1940, 


36 



MEASUREMENT ASPECTS OF THE NATIONAL 
CLERICAL ABILITY 1'ESTING PROGRAM 


WILLIAM J. K. CRISSY 
Conperative Test Scivut 

and 

M. J. WANI'MAX 
University of Riiclusirr 

T he PURPOSE of the present article is (hreefold: to 
discuss the measuicmcnt procetiures emiilovi'tl by the 
committee responsible foi the National Clerical Abililv 'Pcsl 
ing Program; to cite some ol the measurement problems lhal 
confront the committee; and to suggest possible improvemeiUs 
in the procedures and possible solutions to the problems 
raised. While the nrgani/.ation, sponsorship, ami administia- 
tion of the program have been described in detail elsevehere' 
and are outside the scope of this article, it is necessary to make 
at least a summaiy statement concerning them in order to 
orient the reader to the subsequent discussion, 

The National Clerical Ability 'Pesting Program is spon¬ 
sored by the National Gflicc Management Association ami the 
National Council for Business Education, Its purpose is to 
appraise the fitness of high school, business school, ami college 
graduates for beginning office positions in the fields of stenog¬ 
raphy, typing, machine transcription, bookkeeping, calculating 


’^National Clcruid AhilUy Tests. Bulletin N'i». 1, Kiivcrnlier 193'!. Joim 
Committee on Tests. (Cnmliii(lj;e, Mass.: Ilaivaitl Ui'ivei.iity.) 

Naiiaml Cleiical Ability Tests. Repml uii WO TesTina Tinmim. {ihiA, 
1940.) 

W J, E Crlssy, "The TestiOK PruKr.'im of ilie Jniiii C'nmmittce nf ihc 
National Olfice ManaRcmcnt Association and the HasiiienH liducationnl fouiicii." 
Cotifsreiici o[ Slate Testing Leaders (PioecediiiKs of) Octoher 2S, 1939. (Wash¬ 
ington, D. C : American C’oiincil on Education). 

37 



educational and l>SYCnOLf)(JK'Al. MKASI’KKMENI' 

machine operation, and filing. To assist in such an appraisal, 
the National Ckucal Abilily Trsfs arc adnun.stcrcci annually 
in centers throughout the country. In addition to tests of skill 
in the fields referred to, a basic lutteiy of tests_ is used to 
measure the prospective employee’s competence^ in r.nglisli, 
arithmetic, business information, and general infiinnation. 
Certificates are awarded which are based upon the examinee s 
proficiency demonstrated on the special skills tests and also 
upon his general background as measured by the basic battery. 
Most examinees are candidates for just tme ceitifieate .ind 
take only one of the skills tests. However, as niany a.s three 
ceitificates may be sought by each examinee. Certificates are 
awarded in each field in which the candidate reaches the re¬ 
quired proficiency. 

The procedures and problems involved in tliis program 
will be discussed under the following heads; 

(1) The Separate Tests—a description of their form, 
content, etc. 

(2) Statistical Methodology—differential weighting of 
each test for each group of caiulidutcs in a given 
skills field. 

(3) Certification Procedure. 

The Separate Tests 

All tests in the basic battery in 1941 arc of the objective 
type and are scored by machine The tests in I'inglish, arith¬ 
metic, and business information are printed in a single booklet 
called the Fundamentals Test. This test requires 90 minutes of 
testing time. The areas of English sampled incliiile spelling, 
word usage, and the use of the apostrophe in posscssives and 
contractions. An improvement in the English 'Pest vvoukl he 
the inclusion of a section measuring the examinee’s knowledge 
of punctuation (rather than only one aspect of it) and a sec¬ 
tion testing vocabulary. The Arithmetic Test comprises prob¬ 
lems Involving the four basic arithmetic operatioms, and 
applied problems, such as the handling of discounts and the 
computation of interest. All items in this test arc in five-choice 


38 



NATIONAL CLERICAL ABILITY TESTINCJ PROGRAM 


form. The choices include four plausible answers and a fifth 
alternative, “None of the above.” The examinee must actually 
compute the answer to each problem since in certain items all 
four of the plausible answers aie incorrect. The Business 
Information Test samples the applicant's knowledge of office 
procedures, postal regulations, technical business terms, and 
their applications. The Gcneial Information Test measures 
the examinee’s knowledge in such areas as world affairs, sports, 
etiquette, geography, and history. It requires SO minutes of 
testing time. This test has a wide range of discriminability and 
IS positively correlated with every test in the battery, yet the 
correlations indicate some independence of measurement. It 
has been suggested that this test be replaced by a test of gen¬ 
eral Intelligence, but various personnel officers have reported 
favorably on its inclusion in the battery. There is some evi¬ 
dence from the use of the test in employment offices to indicate 
that it has some value in predicting successful adjustment on 
the job. 

The tests in the skills fields arc miniature test,s, that is, 
they present in miniature typical business situations in each of 
the areas included. They arc long enough to measure the 
speed and accuracy with which particular tasks can be done 
over a significant period of time. 

The Stenography Test provides for 48 minutes of dicta¬ 
tion and 120 minutes of transcription. Fifteen items are 
dictated including letters and memoranda to be edited. Kx- 
aminees are furnished printed copies of the letters to which 
the dictated letters are replies. A relatively even speed of 
dictation is maintained; the rate of dictation is 90 words per 
minute. Unusual spelling or punctuation is explained, and re¬ 
quests to repeat sections of the dictation are heeded if made 
within a specified time. 'I he administration procedures arc a 
departure from the usual school test in stenography hut they 
are in accordance with usual office conditions. In the tran- 
sciiptlon similar steps have been taken to approximate busi¬ 
ness practice: erasures are permitted, change of wording is 

39 



educational and PSYCHOI.OOU'Al MI X*-* KiMiXl 


not penalized when the sense of the item is kept, small point 
deductions are made for correctible eirors. (r.uispuMng 
letters, while severe penalties arc imposed for untorrecfiblc 
errors such as omissions that wouhi rei{uire iiiterliiie.»tion. In 
computing the total score a bonus, proponijinal to tifc number 
of minutes remaining, is given to all candidates who hand in 


their transcriptions before 
in connection with this test 


“time" is calk'll. The iliief problem 
is concerned with scoring. If .seems 


logical that, in terms of office practice, speed does not beunne 
advantageous until some acceptable leiel of accuracy is 
reached Under the present plan of scoring, no such level of 
accuracy has been specified and hence some examinees obtain 
high scores due to their typing speed while their accuracy h 
at a level of doubtful acceptance in beginning office work. 


The Typing Test permits a inaxiimim of I2(< minutes of 
working time. Form letters, reply cards, etc., are furnished 
the examinee, and he is given several typing jobs which involve 
the use of the materials furnished. This test approvimates 
office conditions, as docs the Sti'iiogfapliy Test, ami it i)ro> 
vides a more comprehensive measure of typing ability than 
can be obtained from the usual kind of test in tills field. I low- 
ever, the scoring problem is the same here a.s in the Sit nog- 
raphy Test. No provision is made for a minimum accuracy 
score below which a bonus for time saved may not be added 
when the total score is computed. 


The Machine Transcription Test involves the transcrip¬ 
tion of seven items from either an Kdiphone or Dictaphone 
cylinder. A maximum of 60 minutes is allowed for the tran¬ 
scription. Scoring procedures are in line with business practice 
except, again, for the inadequate method of handling the speed 
aspect of the score. 


The Bookkeeping Test requires the examinee to work to 
completion specified operations on excerpts from a set of 
books. The format of this test has been approved and recom¬ 
mended by experts m bookkeeping and accounting procedures. 
There is evidence to indicate that it measures bookkeeping 


40 



NATIONAL CLERICAL ABILITY TESTING PROGRAM 


more adequately than do tests Involving indirect evidences of 
ability in this field. 

The Machine Calculation Test measures the examinee’s 
ability to cairy out the four basic arithmetic operations on a 
key-driven calculating machine The addition problems cover 
columnar summing and cross-footing. The multiplication sec¬ 
tion ranges from simple multiplication to multiplying to obtain 
a sum of the products. The subtraction problems extend from 
the very simplest single subtractions through problems requir¬ 
ing first alternate columnar summing and then subtractions to 
obtain balances. The division section includes questions m 
direct division and also in obtaining reciprocals to be used as 
multipliers. The entire test icquires 120 minutes of working 
time. 

The Filing Test samples the examinee’s ability to file 
various materials furnished and also his knowledge of ac¬ 
ceptable filing procedures in the solution of problems that 
frequently occur in office practice. Alternative sections are 
included in the last part of the test to cover dilfeient filing 
systems. The test requires 120 minutes of working time 
Statistical Mcthodoloyy 

In order to obtain an over-all appraisal of the examinee’s 
ability, scores on the basic battery and the skills test arc com¬ 
bined into a “best-weighted” composite. Since the competen¬ 
cies measured by the various tests in the basic battery are of 
different impoitancc in each of the six skills fields, there exist 
six different weighting applications, one for each group of 
examinees taking each of the skills tests. 

Four components make up the weight accorded to each 
test (both basic battery tests and skills test) within each of 
the six fields: 

(1) The variability or dispersion of the scores made by 
the group of examinees. 

(2) A function of the test’s reliability for the group; the 
quasi-regiession weight of the test. 

(3) The estimated importance of the test. 

(4) The independence or uniqueness of the test. 

41 



EDUCATIONAL AND I>SY<'lIUI.fUiU’AL MIASrttlMlvr 


The method of combining these components iiniy be 
symbolized thus: 



where 

W^j = weight accorded test i -withiii a particular skills bat¬ 
tery j (a skills battery includes a skills lest and 
the basic battery); 

/3y = quasi-regression weight of the test i in battery j\ 
lij — estimated importance of the competency measured 
by test I in the battery j; 

C/,j = uniqueness of test t in battery ;; 

Oy = standard deviation of scores on the test / of persons 
selecting the skills test designated for buttery 
Standard deviations are computed within skills groups for 
each test and are used in the weighting procedures as imlicated 
in the formula presented above. This furnishes the lirst com¬ 
ponent of the weight for each test in each battery, 

No criteria are now available- against wliicli to obtain re¬ 
gression weights on the tests. However, to account for the 
second component, quasi-regression weiglils arc computed 


using Kelley’s formula®: /)' = , ^vheI•c r„ Is the re- 

liability coefEclent of the test. 

The third component of each test weight is determined by 
having a committee of experts in each skills field independently 
judge the importance of the competency measured hy each of 
the basic tests (English, arithmetic, business information, and 
genera information) relative to a basic weight of 10 accorded 
e s 1 Is test in that field. For example, in stenography, the 
importance weights are; 1 

English 3 
Arithmetic 1 


ing 1939 and 1940 examinees! validation atudy in prtiBi'ea# involv- 

tretalZ 5 Kelley, Intrr- 

PP 212-213. I Yonkers World Hook Company, 1927), 


42 



NATIONAL CLERICAL ABILITY TESTING PROGRAM 


Business Information 2 
General Information 3 
Stenography 10 

Uniqueness, the fourth component, is measured by using 
the median alienation cocfiicient for each test within a par¬ 
ticular battery This involves obtaining six intercnnelation 
matrices, each matrix including a particular skills test and the 
basic tests (SxS matrix), d'hc median conelation coeflicienl 
in each row is then used to obtain the alienation coefficient in¬ 
dicated above 

In order to "equalize” weights for importance and ttniquc- 
ncss so that the sums of the two sets of weights will be equal 
before the several components of the weight, f/',,, are com¬ 
bined, the alienation coefficients (uniqueness components) are 
each multiplied by a constant equal to the columnar sum of the 
five importance weights divided by the columnar sum of the 
five alienation coefficients. This is clone in the ca.se of each of 
the six skills batteries 

To illustrate the weighting procedure, the computational 
steps in the case of the 1941 Stenography battery are indi¬ 
cated in Table 1 


TABLE I 

BTENOORAnir 


A 

B 

C " 


n 

K 

E 

G 

n 

I 

J 

Stenography 

+6 55 


.90 

9,487 

98 

10 

3.995 

132.771 

2,852 

3IIX 

Bus. Inf 

11,48 

.66 

,59 

1.873 

.90 

7 

3.669 

lfl.6I8 

,925 

i.no 

English 

7 66 

.8+ 

,70 

2.789 

93 

3 

3.792 

18.9+3 

2.+73 

2.67 

Bus. Anth. 

+.29 

75 

.73 

3.16+ 

,93 

I 

3.792 

15,162 

3,53+ 

3,82 

Gen. Inf 

22.31 

.8+ 

82 

5 031 

.92 

3 

3 751 

33.96+ 

1.522 

1 65 





SE' 

=4.66 El 







Column Data Presented 

A Tests included in Stenography battery. 

B Standard deviation for each test within the Stenog¬ 
raphy group (ciy). 

C Reliability coefficients computed by "spUt-half” 
method based upon sample, 

43 



educational and rsvt'Hiiinctc\i mi Fi\n,\i 


Adjusted reliability coefficients Icnrrecteil to range 
of Stenography group) by formula: 

" a 

Quasi-regression weights {/!',,) • 

Uniqueness weights (median ^’’s). 

Judges’ weights of importance (/,,). 

Entries in column E adjusted relative to entries in 
column F by the formula: 


G = ^Oi (f'J 
H B(P + G) =;8',FT! r/„) 

I g. y„(t, + t/„) ,,, 

B o„ “ - " 

J Entries in I, each divided by smallest entry in I 
(Column J contains the weights which are finally 
used This makes subsequent computation easier by 
making the smallest J!\, equal to unity,) 

When the weight for each test has been obtained bv the 
foregoing procedures, a composite score for each examinee is 
obtained by multiplying each score by the appropriate weight 
and summing his weighted scores. 

Certification Procedure 

The certification of an examinee in a particular skills field 
depends upon two factors (1) having a skills test score equal 
to or in excess of the critical or “passing" score set for that 
particular skills test; and, in addition, (2) having a composite 
score equal to or in excess of the critical or "passing” score set 
for that particular composite. 

The critical score for each skills test is established by a 
committee of experts in that field (usually the same committee 
tvat estimates the importance weights), The crllerlcm used by 
each committee is “minimum acceptable performance in a 
b=g.nn.n8 ofice ,ob " The procedure used is to have each 
committee member inspect and judge as acceptable or unac. 
eptable a sample paper from each two per cent segment ot the 


44 



NATIONAL CLERICAL AUILI'I'Y TESTING PROGRAM 


distribution beginning at the twentieth percentile point and 
extending to the eightieth percentile point, After independent 
judgments are completed, the combined judgment of the com¬ 
mittee is used to determine the critical score on the particular 
skills test. 

To determine the critical composite score, regression 
technique is employed; the desired critical composite score Is 
predicted from the previously determined ci'itical skills score 
through regression of composite on skill. 

Obviously the correlation between each skills test and the 
corresponding composite is high, since the skills test is (he 
chief component of the composite. An improvement in this 
procedure would be to exclude the skills test from the com¬ 
posite and thus use the composite as a “background index,'' 
Then if the critical composite score were ohtaincil by predic¬ 
tion from the critical skills test score by means of a regression 
equation involving composite and skill, the only hypothesis m- 
volved would be that the minimum “backgrouml'' score should 
be about equivalent to the minimum skills score. 


iS'ww iiuiiy 

In this paper have been discusscil the meusiircmciU pro¬ 
cedures and problems connected with the National Clerical 
Ability Testing Program The treatment of problems has been 
limited to those Avhich arc peculiar to this particular program. 
The procedures, however, have been covered in tietail because 
they have general application to other types of testing 
projects. 

The weighting and certification proccduics desciibed in 
this paper should obviously be completely revised as soon as 
“outside criteria" aie available against which to Aveight the 
tests Probably the best procedure to use when these criteria 
are available is canonical correlation tecluiique modified to in¬ 
clude mportmee weightings of both the separate criteria and 
the separate tests. 

So long as such outside criteria are not available, the pro- 

45 



EDUCATIONAI, AMI I’.SV('W«»I rliiji'S F. MJ K5 »4I M 


cedures used at present should be nn*dh)rd ridu 
ance with the suggestions made in this paper ur in 
manner if the program is to render increased srrs 
pective clerical employees and to the emphn t- 
persons. 


r m .Ut'ord- 
’'■one (jiher 
■itc to prns- 
fs ot such 


46 



INTROVERSION-EXTROVERSION AS A FACTOR IN 
TEACHER-TRAINING 


CATHARINE EVANS 
Indiana Univcrsiity 

and 

C. GILBERT WRENN 
University of Minnesota 


Introduction 

O NE OF THE SERIOUS problems facing teacher-train¬ 
ing institutions today is the selection of student personnel. 
Many teachers are unsatisfactory either because of inadequate 
training or unfortunate personality chaiacterislics. Teacher- 
training institutions must select students who can benefit most 
from the improved training programs now provided. This 
study IS intended to throw some light on the relationship of 
personality traits to student success in teacher-training pro¬ 
grams. More specifically, the purpose of this study has been 
to determine the relationship of Introversion-Extroversion^ to 
the scholastic achievement and student teaching success of 
education students. An /-£ Inventory was administered to 
396 seniors in the College of Education at the University of 
Minnesota. This inventory will be described briefly in order 
that the results of the study may be understood clearly. A 
more complete discussion of the construction of the inventory 
is available in a recent article in the Journal of Psychology 
(1); _ _ 

This inventory was constructed to measure three types of 

^Throughout the remainder of the article, Introversion-Extroversion will be 
designated as T£, 


47 



RnrCATKiNAr. AXI’ l'*'\ I (('•HWitl \J ’*1! IS' RfXlt^'j 


I-E, Thinking, Social, ami HmuUnnal. whkh wrrr isolucdliy 
Guilford (2) in his factor analysis of M'. nripi?],!! items 
were developed and stated in the form of questions ermcern- 
ing the behavior and reactions of the student. The tuiestioM 
were fonnulated in such a manner that the studetU could indi¬ 
cate how frequently he or she hchaved m lli.it way. 1 yitical 
questions were, "Do you question statements .nul ideas ex¬ 
pressed by your professors?" "Do you enjoy catin}^ meals 
alone?" and "Do you avoid exaggeration in your statements?" 

The construction and choice of items for the three tests 
in the inventory were guided hy the following ilefmitions which 
contrast the extremes for each type t»f l-If; 

The thinking introvert likes rellective thinight. particularly 
that of a more abstract nature. His thinking tends to he less 
dominated by objective conditions am! generally accepted ideas 
than thinking of the extrovert, dlie thinking extrovert, how¬ 
ever, shows a liking for overt actitin, and his ideas tend to he 
ideas of overt action. 

The social introvert withdraws from social contacts and 
responsibilities and displays little interest in petqde. In con¬ 
trast, the social extrovert seeks social comatts ami depends 
upon them for his satisfaction. 

The emotional introvert tends to repress ami inhibit out¬ 
ward expression of his emotions and feelings. On the other 
hand, the emotional extrovert readily expresses Ids emotions 
and feelings outwardly. He shows a greater tendency to make 
the ppected response to simple, direct emotional appeals than 
the Introvert, 

In constructing this Inventory an effort xvus made to de¬ 
velop relatively independent measures for these three types of 
I-E by a technique of item analysis. The intercorrelalion co¬ 
efficients for the 396 College of Education seniors were: 
Thinking and Social I-E tests, ~,2S; Thinking and K motional 
I-E test, +,17, and Social and Emotional I-K tests. I .23. 
These low coefficients indicate that the three tests are relatively 
independent, The inventory also seems sufficiently reliable for 
individual prediction since each of the tests has a reliability 
coefficient with groups of education students of .88 or above, 

48 



INTROVEKSION-EXTROVERSION IN TEACHER-TRAINING 


for either the retest or split-half technique or for both 
techniques. 

Indirect evidence concerning the validity of each test is 
available, in terms of the ability of the test to diiferentiate 
known groups of college students which on an a prion basis 
seemed to be extreme in the type of I-E involved. For example, 
the Thinking I-E test significantly dilferentiated major groups 
in the College of Education; the majors in physical education, 
home economics, commercial education, and child welfare were 
extreme in the direction of Thinking Extroversion, while the 
majors in English, art, mathematics, social studies, and lan¬ 
guages were extreme in the direction of Thinking Introversion. 
I'he Social I-E test significantly dilferentiated groups of stu¬ 
dents varying in the degree of participation in campus activi¬ 
ties; the members of academic sororities and fraternities and 
the students active m campus organizations were found to be 
more socially extroverted than the non-aifiliaLes and non-pai- 
ticipants. Likewise,,the Emotional I-E test significantly differ¬ 
entiated sex and age groups; women were more emotionally 
extroverted than men, and the younger student groups were 
more emotionally extroverted than older groups. Each test 
did differentiate known groups of students which logically 
were expected to be extreme in that type of I-lt. 

Scholastic Achievement of Student Groups Varying in 
Thinking I-E 

The relationships of the scores on the three tests in the 
I-E Inventory to measures of scholastic and student teaching 
success have been explored in this study of College of Educa¬ 
tion seniors. 

The relationship of Thinking I-E to scholastic achieve¬ 
ment honor point ratios was explored for the seniors In the 
College of Education who had taken a scholastic aptitude test, 
the Miller Analogies Test, during the junior or senior year. 
The Miller Analogies Test —Form G, consists of 100 anal¬ 
ogies which research with college students indicated were dis¬ 
criminating items. Data reported by Dugan (3) indicate that 


49 



EDUCATIONAL AND PSYrilOl.OtiU'Al. M! \''rRi Ml XI 


this test Is highly reliable and valid as a measure of sehnlastic 
aptitude. The correlation between the scores on the 'riunking 
I-E and the Mdlo yhialoyics Tcu> varied from .Is to ,26 
with groups of 112 to 260 education students, indicating only 
a small positive relationship. 

Significant results were obtained in the siuil\ of the 
scholastic achievement of 148 native htudeiits, i.e , sludents 
who had taken all of their college work at the I 'niversily of 
Minnesota. These students were divided info four groups 
according to the varying size of their scores on the 'riiinking 
I-E test, and the mean honor point ratios for each group were 
computed for the total couise work, for niujor courses, and 
for education courses Theie was in general a progressive in¬ 
crease in the mean major and education honor point ratios 
with an increase m introvei sion in this group. Those students 
with Thinking I-E scores in the quarter extreme in flu* direc¬ 
tion of introversion had sigiilticantly higher mean lionor point 
ratios than those with scores in the quatler extreme in the 
direction of extroversion (see Table 1), 


TAllLK 1 

MEAK HONOR POINT RATIOS IN TOTAI-, SONJOR, AND 1 IIVC VUON COURSIS 
for IOUR groups of NATIVr .STUni.N J .S V.ARY1.\0 IN 

_ de gree of thinking I-I-. 

Mean 

Thinking I-E Groups 'p„tal 

Upper Quaiter (Extroversion). 1.2711 

Second Quaitei . \ '\')\\ 

Thiid Quarter. 1.6776 

Lowest Quarter (Intioverdon) . ,. 1.'6138 


Honor Point Raiio 
Majin Kducation 

1.674.1 ■ 

1.7878 1.8050 

1.00,51 1.8849 

1.9202 i.K92() 


UIPEERENOE and THE SIGNIFICANCE OF THE DlrFCRENCK IN .MEAN 
honor point ratios for the extreme thinking 
QUARTILE GR0U1>S of native .S'TDDENTS 


Variable 

Total Honor Point Ratio. 

Major Honor Point Ratio . " 
Education Honor Point Ratio! ! 


Difference 
in means 
" .2427 ■ 
.2549 
.3369 


t 

2.9241 

2.7117 

2.9553 


Prnliability 
of / 

.01 

,01 

.01 


so 







INTROVERSION-IiXTROVERSION IN TEACHER-TRAINING 


The analysis of variance technique (4) was applied to the 
total, major, and education honoi point ratios for the four 
Thinking I-E groups in order to test the significance of the 
difference in means. The vmiance among the mean honor 
point ratios of the Thinking I-M groups was significantly 
larger for each of the three analyses of variance than the 
variance within the groups. The ratio of the variances for 
each type of honor point ratio satisfied at Ica.st the live per 
cent level of significance. The results of the analysis of vari¬ 
ance for each type of honor point ratio refuted the hypothesis 
that there was no difference in the scholastic achievement of 
the four Thinking T-M groups. The groups were heterogeneous 
in scholastic achievement (see Table 2). 

The analysis of variance technique with the two criteria of 
classification, Thinking I-E and hliUcr JiuiUxjtcs scores, was 
also employed with the total honor point ratios in ordci to 
determine whether or not the variance among the mean honor 
point ratios of the four Thinking 1 E groups would be 
significant when the groups were subdivided according to 
Analogies ability. 

The variance between the Thinking I-E gioups was not 
significantly greater than the variance within the Thinking- 
Analogies subclas.scs. When the variance within the Analogies 
qiiartile groups was considered, there was insufliclent evidence 
to determine any difference in the scholastic achievement of 
the foul Thinking I-E quaitilc groups. 

From the analysis of variance data the mean honor point 
ratios of the following four groups of native students varying 
both in degree of Thinking I-E Analogies ability were com¬ 
pared: 

(1) Students below the median in the direction of 'fhink- 
ing Introversion and above the median In the ability 
measured by the Analogies I'est. 

(2) Students above the median injThe djfectron of Think-, 

ing Extroversion and iib()vc| tnir SH,;y]L(i|ilmie^ > 

ability. 



EDUCATIONAL AND PSYCHOLOtafAL MI A.srR}MI NT 

(3) Students below the median in the direclittn of 'rhink- 
ing Introversion and below tile median in Analogies 
ability. 

(4) Students above the median In the ilirection of Think¬ 
ing Extroversion and below the median in Analogies 
ability. 

A steady decrease in mean total honor point ratio from the 
first to the fourth groups can be noted in *rable .1. 


'I'ABLK 


MEAN TOTAL HONOR POINT RATIO.S FOR TOUR I.ROVI'S OF N.\‘11VI SIT- 
DENTS VARYING IN THE DEOREF OF TIIINKINO I-l .Wll OI' I Hi, 
ABILITY MEASURED BY Till, MlLLl.R AVAUKillS Tl.Sf 

Mc.ui'foial 

Number Honoi Point 

of Group TypeofGi oups Xuiiibcr Raiin 

1 Below ^ the median in the (lircetion of 

Thinking Introversion ami abuve the 

median in Analogies abilitj. 4{» l.75(A 

2 Above ^ the median in the direction ot 

Thinking Extroversion and aburc ilte 

median in Analogies ability. JR I.^JIK 

3 Below the median in the direaion of 

Thinking Introversion and below the 

median in Analogies ability. J.R ] .TLJo 

4 Above ^ the median in the direction of 

Thinking Exti aversion ant! belotv tite 

_ median m Analogies abi lity. 4() 1.400(1 


The first group had a significantly higher mean honor 
point ratio than the second group (t == 2,24). However, this 
second group did not have a significantly higher mean honor 
point ratio than the third (r= .58), These data seem to 
indicate that high Analogies ability combined with a tendency 
toward Thinking Introversion is more ideal from the stand- 
pom of scholastic achievement than either high Analogies 
ability combmed with a tendency toward Thinking I-xtrover- 

ThinLnnttrTo?®*'' 

Thinking Introversion and 
gl Analogies ability to scholastic achievement was attacked 

52 







INTROVERSION-EXTROVERSION IN TEACHER TRAININO 


from another angle The 24 native students who had honor 
point ratios above 2.00, and the 26 who had honor point ratios 
below 1.50 were compared in mean scores on the Thinking 
and Analogies Tests (Table 4) The stinlcnts with the higher 
honor point ratios had significantly higher Thinking Introver¬ 
sion scores and Analogies scores than those students with lower 
honoi point ratios. The values of “l” were 2.51 and .3.5S, 
respectively, and they satisfied at least the five per cent level 
of significance. 

TABLE 4 


MEAN THINKING AND ANALOGIES SCOlU-S Or NAnVE STUDENTS WII'II 
HONOR POINT RATIOS ABOVE 2 00 AND OF NATIVI S'l UDEN'IS 
WITH HONOR POINT RA'IIOS BELOW 1.50 


Type of Gioup 



No. 

Mean 
'I'liinking 
I-E Score*' 

Mean 
Analo¬ 
gies Seme 

Natives with Plonor 
above 2.00 

Point 

Ratios 

24 

O.B.kkI 

CiO.OS.LI 

Natives with Honor 
below 1.50 

Point 

Ratios 

26 

2,14615 

4S.65.IB 


'The smallei the scoie, the gicatci the tendenev toward intiovei- 

.sion. 


The evidence for the relationship of'riiinking Introversion 
to scholastic achievement for native students seems weakened 
by the non-significant results obtained in the analysis of vari¬ 
ance by the double criteria of classification. However, the 
study of group differences points to the desirability of the com¬ 
bination of a tendency to Thinking Intiovcrsion with high 
Analogies ability. Indeed, the results of the analysis of vari¬ 
ance by the double criteria of Thinking I-K and Analogies 
ability can be interpreted as strengthening the evidence for the 
greater desirability of the combination of Thinking Introver¬ 
sion with high Analogies ability in contrast to the desirability 
of a tendency to Thinking Introversion alone regardlcs.s ol 
Analogies ability. 

When transfer students, i e., those students transferring 
to the University of Minnesota with advanced standing from 

53 




educational and I>SVCnni.(K,Tl'AI mi am ri .mi ni 


other institutions, were studied, the residts \u-re iint siKniliciuu, 
Although the mean honor point ratios of tlie transfer students 
in the quartile group extreme in the direction of 'I'hinking 
Intioversion weie larger than the means of lliose in the quarter 
extreme in the direction of extroversion, the tliHerenees v,erc 
not significant. Likewise, the results of the analysis of variance 
were not significant. However, significant dillereiues were 
found between samples of native and tniiisfer students in the 
distribution by major fields, in the mean .scores on the 
Analogies test and in the means of the major and education 
honor point ratios. The explanation of the contrasting results 
obtained with native and transfer students must lie in these 
differences. It may be safely concluded however, that there is 
a relationship between Thinking Intioversion ami sc'liolastic 
success for the native student in the College of Lducation. 

The Sliideni Teachintj SiKcru ttf Groiili'^ nj 
Sludenu Varying in I K 

The relationships of scoics on the three l-L tests to stu¬ 
dent teaching success at the UniveiLsity of Minnesota were 
also explored Two measures of student teaching success were 
employed in this study. In the first place, the rank onlcr lists 
of the student teachers in 16 major fields were ohlaineii. d'hesc 
rank order lists were made out by the combined group of critic 
teachers for a major field. Second, the marks in .stutlerU teach¬ 
ing for the three quarters were computed as an honor point 
ratio for 242 seniors. 

The coefficients of correlation between the student teaching 
ranks and the ranks of the scores on the three I-K tests were 
computed for the majors in eight of the 16 teaching fields. 
These were the majors which numbered 14 or more cases. 
The Spearman rank difference formula was employed in the 
calculation of these rank correlation coefficients. 

These coefficients as given in Tabic S varied from .00 to 
.43. They were based on such small samplings that it was 
improbable that any of the coefficients were significant, All of 
the coefficients of correlation between Social I-K and student 

54 



INTROVliKSION-IiXTROVERSION IN TEACIIER-'l RAININCi 

TABLE 5 

RANK COEFFICIUN rs OF CORRELATION ULTWELN STUDENI' '1FACTIINC; 
RANKS AND IHE nilUili I-E TESIS FOR Lltilir MAJOR Fthl.US 


Thinking Social Kiiiotinnal 



Majoi Field 

N 

1-ETc.st 1 

-E 'Pest 

I-E Test 

1 

English. 

... 50 

— 23 

-f .19 

■I-.24 

2 

Social Science. 

... 4*1 

- -.00 

-I .17 

j .23 

3. 

Child Welfaie . 

38 

—.20 

■1 .12 

—.03 

4. 

Science. 

. 19 

— 04 

i 38 

—.38 

5. 

Commercial. 

19 

— 18 

I-.43 

—.12 

6. 

Alt .... 

. 17 

-! .37 

■I-.19 

1 .16 

7 

Gill’s Physical Education 

. 15 

—.37 

1 ,35 

—.41 

8 

Mathematics . 

. 14 

,00 

•1 ,25 

-1 .18 


teaching rank, however, weic positive. J'hls consistent posi¬ 
tive relationship of Social Ltxtrovcision and student teaching 
rank for the eight major Helds does seem to indicate the type 
of relationship existing between these two factuis Six of the 
eight coefficients between student teaching ranks and the scores 
on the lliinking I-E Test weie negative. Thus a tendency was 
indicated for Thinking Introversion to be related to student 
teaching success. The relationship of ItmotionaJ I hi to stu¬ 
dent teaching rank was not consistent in the eight major fields, 
Four coefficients were positive, and four were negative. 

The relationship of student teaching success to the I-hi 
scores was also determined for SS students whose student 
teaching ranks were in the upper and lower quarters on the 
rank order lists of the same eight major fields. This analysis 
indicated that the more successful student teachers tended 
toward more thinking introversion, social extroversion, and 
emotional extroversion in central tendency than the less suc¬ 
cessful student teachers. However, only one of tiiese differ¬ 
ences in mean I-E scores was significant. 'I'he students in the 
upper fourth on student teaching rank order lists were 
significantly more socially extroverted than the students In the 
lower fourth; the value of "//' 2.58, satisfied the five per 
cent level of significance. There were, on the other hand, more 
than five chances out of one hundred that the diflerences in the 


55 







EDUCATIONAL AND PSVCIlOLOt.TCAL MI AHrRl.MI NT 

mean Emotional and Thinking scores could have occurred 
from chance enois of sampling. 

Since there was a signilicanl diliercnce in the mean scores 
on the Social I-E test for the groups in the upper and lower 
fourths on the student teaching rank older lists, the analysis 
of variance technique was employeil to siiidy these tiiftercnces. 
The mean scores on the Social I-E test of the f'mir groups 
formed on the basis of ranks in student teaching are given in 
Table 6. There was a progressive increase in mean scores on 


TABLE 6 

MEAN .SCORES ON THE SOCIAL I-E TEST Or Till. EOC'R OKOri'S 1 ORMl 1) ON 
THE BASIS OP RANK IN STUDENT TEVelllNO 


Gioups Varying in 



Mean 

Student Teaching Rank 


No. : 

Snci.il Score 

Upper Quaiter. 

. 

.... 

1 ‘'.'Uf.t 

Second Quartei . 


.... S7 

1 5.2().!d 

Third Quarter. 


..., “54 

7 7‘)(i.5 

Lower Quarter . 


.... 'll 

4.()7J7 


'^The larger the score, the greater the ilegrcr of extmveisiun. 


the Social I-E test for the four groups with the increase in 
student teaching rank. In other words, the tendency to Social 
Extroversion increased as student teaching rank became 
higher. The simple analysis of variance was enijrluyed to test 
the null hypothesis that there was no difference in the I'tnir 
^-oups from which the mean Social I-E score.s were obtained. 
The variance among the four groups varying in practice teach¬ 
ing ranks was significantly greater than the variance within 
the groups, indicating that the four groups were not homog¬ 
enous in Social I-E. These data indicated that Social Extro¬ 
version was related to student teaching success as measured hy 
the ranks of critic teachers. The more successful the student 
teacher, the greater was the tendency to Social Extroversion. 

ihe marks m student teaching for three quarters of work 
were also employed as a criterion for the choice of two groups 
of students of extreme degrees of success 1„ ,cachi„B. A 


56 







INTROVERSION-EXTROVERSION IN TEACHER-TRAINING 


majority of the seniors had student teaching honor point ratios 
between 2.00 and 2 50. There were 68 students with ratios 
above 2.50 and 67 students with ratios below 2.00. 

The mean scores on the I-E tests of these two groups were 
compared. The same differences were found foi these two 
groups as for the two groups chosen hy the criterion of stu¬ 
dent teaching ranks with the exception of the scores on the 
Thinking I-E test. The mean scores indicated that the more 
successful student teachers as judged hy thcii marks were more 
extroverted in Thinking, Social, and Emotional I-E than the 
less successful teachers. Elowevcr, the values of “t” were so 
small that none of the three differences in means was signifi¬ 
cant The rank order lists seemed to yield a moie differentiat¬ 
ing measure of student teaching success than marks in student 
teaching. In fact, students with a B average (2.00 ratio) in 
student teaching varied in the lank given hy ciltic teachers 
from the lowest to the upper quarter. 

The results of the use of the two criteria for teaching 
success indicate that the mote successful student teacher is on 
the average more extroverted, socially and emotionally, than 
the less successful student teacher. No conclusion can be made 
in relation to the Thinking l-Bl test because of the conflicting 
results. 

Summary 

The results of this study indicate that for “native" seniors 
in the College of Education, Thinking Introversion is related 
to high scholastic achievement and that Social and Emotional 
Extroversion are related to student teaching success. These 
results provide evidence to substantiate a common “hunch" 
that good teachers are not only good students but also must 
possess certain social and emotional characteristics. The re¬ 
sults indicate that a combination of high mental ability and 
Thinking Introversion is desirable for scholastic success. In 
addition, Social Extioversion is also necessary for high rank 
in student teaching. The extent to which the 1-E Inventory^ 


^This inventory will be published soon by Science Research Asaociatea. 

57 



EDUCATIONAL AND PSYCIIOI.fKJU’AI. Ml \SI R1 MI V J 


can be used to pi edict scholastic success in the C‘nllep;e of 
Education or student teachirifr success has not mT been de- 
teimined. The I-E Invcntoiy is heiiifr used, lio\U’\ er, in a 
curient study in this same collcfre that will follow a class of 
juniors through a four-year period. By studying their success 
in student teaching and on the joli, it will he [lo.ssihle to indi¬ 
cate the predictive values of the inventory and oilier instru 
ments not only for the training period hut also for job 
adjustment 


REFERENCKS 

1. C. Evans and T. R. McConnell. “A New MeaMiie iil Intioversiim- 
Extroversion,” Joumal nf Psychuloffy, .XII tR'41), 111 124. 

2 J P. Guilford and R B. Guilfoid. ‘‘PerMin:ilil\ F.kTois, S. E. and 
M, and Their Measmement.” Journal of lUyrholoin, II (ITIb), 
109-127, 

3 Willis E. Dugan, “A Study of the Millet Analogies 'IVst wifli 
Graduate Students in the College of Education." rnpuhlislietl 
Mastei’s Thc.sis. University of Minnesota, I'HD, 

4. George W. Snedecor. Statistical Meth<uh. Ames, Iowa; C'ullegitiie 
Press Inc,, 1938, 387 pp. 


58 



AN INVESTIGATION OF TITK POSSIBILH'IES OF 
MEASURING PERSONALITY TRAITS WI'PH THI<: 
STRONG VOCATIONAL IN'ri'.RiOST BLANK' 

LYLK TUSSIN(5 
Wilsun Jiinioi C'dIIckij 


T he Strong Foccliional Inletrsl Blank is a widely used 
instniment for determining vocational Inteiest patterns 
in educational and vocational guidance. 'Phis stiiily was coi^ 
templated because it was bclicA^cd that there was a possibility 
of weighting Items on the Inleral Blank in such a way as to 
obtain, with this single test, certain personality measures as 
well as the vocational interest semes now available. With this 
object in mind, the present study analy/.es the relationship 
between lesponses on the Sirontj Foialitinnl lulcrrsl Blank 
and scores on certain personality tests to determine how well 
the traits measured by these tests can be measured by the 
Strong Blank, 

The idea of evaluating other factors than vocational in¬ 
terest with the Strong Focalional Inlcrasi Blank is not new. 
Strong has used his Blank to measure masculinity and femin¬ 
inity (20) and also interest maturity (18, 19), Young and 
Estabrooks (21) studied the relation of personality and in¬ 
terest tests to scholastic success. They found that the Shotig 
Focalional Interest Blank showed evidence of being the most 
significant predictive measure after they had made an item 
analysis of several tests. 

iThis aiudy is a poitioii of a diesis submitied to llie Faculty of I’urdue 
Univeisiiy in paitial fulfillment of the I'eqnitementi. for the dcRrcc of Doctor 
of Philosophy, June, 19+1 


59 



EDUCATIONAL AND ESYCIlDl.tlCU’AI. MI-ASt 'Rl Ml \ r 


The purpose of the present sttuly was tn constnut scoring 
keys for the Sltoiig Vocational Interest Blank to measure 
certain personality traits, such keys heing: ( 1 ) suitable for 
use m group testing and available as another ke\ for the 
Strong Vocational Interest Blank, (2) so constructeii that 
scoring is both rapid and free from the per.soiral error. 
Validation of the seales was obtained by correlating the scores 
resulting from the new Strong scoring keys with scores for the 
corresponding Allport-Vernon, Bell, Bernreuter, anil the 
American Council on Education tests. 

The matter of falsification of icspimscs to test items is a 


problem that Is worthy of consideration. Witli most per¬ 
sonality tests, it Is a very easy matter for the [lerson being 
tested to falsify his responses if he wishes, (16) It has also 
been shown that the Strong Vocational Interest Blank c.in he 
changed to falsify interest in a vocation (II). I liiwever, in 
most cases where the individual is interested in obtaining guid¬ 
ance, he will not deliberately falsify his rcsjionses, .Also it 
seems that because of the number and brevity of the item.s in 
Ilat Strong Vocational Interest Blank, falsification of responses 
by the subject possibly would be more dilhcult than in the 
conventional personality inventory of the direct (|iK',stion type. 

It was not deemed necessary in the course of this investi¬ 
gation to examine the nature of personality traits, to dcliiic 
them, nor to investigate all the possible traits. In this .study, 
the validity and reliability of the tests used was accepted, and 
no attempt was made to prove or disprove whether the tests 
selected were valid measures of the traits which they jHirported 
to measure. (1-10 and IS) 


The exact number and the names of ex-istent personality 
traits have not been agreed upon Allporl (4) states that “The 
ugenics Recoid Office has issued a Trait Book containing a 
ist of approximately 3,000 characteristics that might con¬ 
ceivably be hereditary according to the principle of unitary 
characteristics.” Further he cites McDougall as listing live 
elements of personality; Beck, four; and Boven, three. While 


60 



MEASURING PERSONALI'l’Y TIiAITS 


it is unlikely that authorities will agree as to the units of per¬ 
sonality and their exact number, nevertheless, measures of 
“sociability”, “confidence in one’s self”, “home adjustment", 
“health adjustment”, and “emotional adjustment”, as well as 
intelligence, are quite widely measured factors of pcisonality, 
and consequently these measures have been selected fm inves¬ 
tigation in the present study. 

Proccduie 

In this study the problem was to determine how the items 
on the Strong Vocational Inlfrcsl Blank weie related to the 
scores made on the Allport-Vernon, Bell, and Bernreuter per 
sonality tests, and on intelligence tests, and also to find whether 
the Strong Blank could be used as a means of measuring the 
same personality traits as the above-mentioned tests if the 
items were weighted. In order to obtain weightings, it wins 
necessary to determine how the groups scoring at the extremes 
of the measures (high, low) varied in their respomses to items 
on the Strong Vocational Interest Blank. 

A sample group of 300 men was used for e.stahlishing the 
weights. This group was one originally studied by Dr. 1C. 
Lowell Kelly (14) in an investigation of factors contributing 
to marital happiness.“ Consequently, it was a group in whicii 
several selective factors were operating, b’or example, each 
man was about to be married. lie was willing to cooperate in 
a study in which such factors as intelligent curiosity and co¬ 
operativeness play an important role His general intelligence 
and educational backgiound were perhaps higher than the 
average. Most of this group had attended college, and the 
aveiage age of the group was 26.66 with a standard deviation 
of 3 47 However, even though these selective factors were 
operative, the group nevertheless represented a relatively 
heterogeneous group with respect to the variables under 
analysis and therefore was satisfactory for the present in¬ 
vestigation. 

■^Data fo] these sl'iiclio were collected with, the aid iif n senes of grant* 
from the Committee foi Reseaioh on Problems of Sex of the National Research 
Council 


61 



EDUCATIONAL AND PSYCTIOLOGK'AI. Ml ASUR! MI.NT 


The 300 subjects were given a buttery (if tests including 
the Strong Vocational Interest Blank, the Allpnrt Vernon 
Study of Values, the Bernicutcr Personality Ini-cntor\, the 
Bell Adjustment Inventory, and the Otis iS’ .1 Test of Mental 
Ability. The present study utili/.cs their responses to these 
tests as basic data. 


Scores of the 300 men on the nine scales (four scales of 
the Bell Adjustment Inventory; two of the Bernreuter Pei- 
sonality Inventoiy, Fl-C and F2-S; the ()li\ S-.l Test of Men¬ 
tal Ability; two scales, theoretical and economic, of the All- 
port-Vernon Study of Values) were coded and punched on 
cards. This set of cards was then sorted on each of the nine 
scales to identify the subjects scoring in the highest 2S per 
cent and the lowest 25 per cent of the scores on each scale. 
In many instances, due to coding, it was impossible to select 
exactly 25 per cent (75 cases), although this number was 
desired in each of the two groups. 

After these two extreme scoring groups were obtained, the 
next step was that of comparing the responses of these groups 
for each item on the Strong Blank. I his was done by punch¬ 
ing the responses of the 300 subjects to the 42(1 items of the 
1927 form^ of the 5/roH^ Voialional Interest Blank on Powers 
cards. This punching consisted of recording the responses for 
each individual on each item as to 'Tike", "indiflerent", or 
^ dislike for the particular item. In some instances the sub¬ 
ject failed to respond to an item, but, since inspection showed 
these omissions to be fairly equally distributed in the high and 

low groups, it was consideied advisable not to weight such 
omissions 


ter determining which individuals constituted the high 
an ow groups of a particular test, the cards containing the 
rong responses of the individuals composing each group 
were hand sorted by serial number of the subject. I'ach resulting 
ig group of cards was then sorted by item and the number 

Lem' 'Zand "dislike" for 
0 the 420 Items m the high group was recorded. This 

62 



MEASURING PERSONALITV TRAITS 


same procediiic was followed in the low group The per¬ 
centages were then obtained for each of the 420 items for 
“like”, “indifferent”, and “dislike” for both the high and low 
groups (2,620 percentages for each scale). This pioccdure 
was followed for the nine scales. 

After the percentages had been computed for the high ami 
low groups on "like”, “indiHerent”, and “dislike”, the differ¬ 
ence m the two percentages for “like”, “indiflereiit”, and “dis¬ 
like” for each item was compared and the diffei cnees in 
percentage were given a weighting. These weightings ranged 

from -44 to —4 according to the formula W “ 100 

repoited by Strong (12, 17) to be the most satisfactoiy 
scheme he had found for assigning weights to his interest 
scales. 

These new weights were then transfen ed to the 193S 
Strong Blank. Since the 1938 form included only dOO of the 
original 420 items, some of the responses had to be omitted 
in making this transfer. However, the number of changes was 
small and probably did not appreciably affect the reliability or 
validity of the new scales The weights as they now appeared 
on the 1938 form of the Strong VaciUiouctI Interest lilaiik 
constituted a new scoring key. d’hese weights were then trans¬ 
ferred to International Business Machines test-scoring keys. 

A new sample of 103 male college freshmen and sopho¬ 
mores was used as a validating group. These students took 
the battery of tests consisting of the Strong Focalional Interest 
Blank, the Allport-Vernon Study of Values, the Bernreuter 
Personality Inventoiy, the Bell Adjustment Inventory, and the 
American Council on Education Psychological lixamination 
voluntarily. Since considerable time was consumed by each 
individual participating, not all of the men linishcd all of the 
tests in the battery. However, as many subjects as possible 
were utilized in the validation of each new scoring key, 

The responses of the validating group on the Strong Voca¬ 
tional Interest Blank were then scored with each of the various 


63 



EDUCATIONAL AND PSVCnfll.dtIU'AI. MI.ASURI.MI.N T 


new keys (i.e., the key for “economic",’ "thetirctical'', “intel¬ 
ligence'', etc ). By scoring the odd and even items separately, 
it was possible to compute the icliability for each new scale, 
The scores made on the "odds" were correlated with those 
made on the "evens’’, and the reliability foi the whole test 
was estimated by means of the Spearman-Brown formula. 

The validity was obtained by correlating the total (odd 
plus even) scores on the Slront/ Vui-allonal lutrrt'Sl Blank; 
for example, the total scores on "theoretical'' with the theo¬ 
retical scores obtained on the Allport-Vernon Sitidy of J’alucs, 

Results 

Table 1 indicates the number of weighted items on the 
Strong Vocational Intel est Blank. It also shows the re.sults of 
the validating group for the Bcrnieuter (Fl-C, l-‘2-S Scales), 
Allport-Vernon (Theoretical, Economic Scales), Bell (Home, 
Health, Social and Emotional Adjustment Scales), and Intel¬ 
ligence as measured by the American Council on Education 
Psychological Examination. 


TABLE 1 

results of tup new strono iii.ank scales 



"Vl-C” 

■'E2-S" 

"Theot.” 

**Econ." 

"lloilic 

Ad)." 

"llealtli 

All).” 

".Siii'iul 
All) ” 

"Kinui 

Ad) 

"Intel" 

Mean 

-17 4 

19 5 

9.3 

4.2 

-12.8 

-5.0 

48.5 

-6.7 

23,7 

S. D 

5S.2 

36.1 

64.3 

64.9 

38.2 

22.2 

54.6 

25.3 

53.4 

Reliability 

86 

.77 

.82 

.80 

.83 

HO 

.87 

,77 

.80 

Validity 

48 

52 

.48 

56 

.21 

.34 

.50 

.07 

.45 

No. Wgtd 
Items 

347 

327 

336 

325 

319 

246 

339 

289 

343 

No Items 
Wgtd. 2 










or more 

130 

70 

143 

118 

54 

24 

108 

34 

129 


the new“lv 

the original t same name or itymbol as for 

the onginal teat, but is enclosed in quotation marks. 


64 



MKASURINCJ PKRSONALl'iy TRAIl'S 


It is interesting to note the items on the Strong Blank 
which compose these scales. Some of the better (more heavily 
weighted) items on the Strong “Fl-C” Scale (The Beinreuter 
Fl-C Scale purports to differentiate the self-conscious indi¬ 
vidual from the self-confident individual) which appeared to 
differentiate the scif-consctous individual from the sclf-con- 
fident person show that the former dislikes the occupation of 
“aviator “ He dislikes the school subjects “physical Lniining’' 
and "public speaking.” In the field of amuseinents, he likes 
“picnics”, but dislikes “rough house invitations." With refei- 
ence to activities, he dislikes “adjusting a carburetoi”, “repair¬ 
ing electric wiring”, “making a speech”, “organi/.ing a play”, 
“opening a conversation with a stranger", “acting as a yell- 
leader”, and “expressing judgments publicly regaulless of 
criticism.” In the section dealing with peculiarities of people, 
he dislikes “people who assume leadership", but likes “talka¬ 
tive people.” He would rather listen to a story tiiaii tell ,i 
story. In rating his present abilities and characteristics, he 
does not usually start the activities of a grouj), work steadily, 
or liven the group on a dull day. He is not quite sure of him¬ 
self, and he is doubtful as to his ability to accept criticism 
without getting sore and to distinguish between more or less 
important matters. He docs not discuss his ideals with others, 
and his feelings are easily hurt. He loses his temper at times 
and worries considerably about mistakes. 

Opposed to this, the self-confident individual likes "meet¬ 
ing and directing people”, but he dislikes “talkative people," 
He is indifferent as to whether he would rather tell a story or 
listen to a story, and he likewise has no preference in the 
matter of a few intimate friends or many acquaintances. He 
answers in the affirmative to “am quite sure of myself", to 
“accept criticism without getting sore", and “discinss my ideals 
with others,” He does not get rattled easily, and his feelings 
are not easily hurt. The person who is self-confident “enters 
into the situation and enthusiastically carries out the pro¬ 
gram”, and he does not worry. 


65 



EDUCATIONAL AND I’SYCIIOl.DOlCAI. MI ASUKL MLNT 


The Bernreuter h2-S Scale is a measure of .sociability. 
Some of the better (i.c., more heavily weighted) items on the 
Strong “F2-S” which appeared to dilferentiatc the iioihsnciol 
individual from the social individual show that the non-social 
individual likes to “deal with things" rather than people, He 
would like to be a magazine writer, and he prefers “amuse¬ 
ments alone or with two or three other.s" rather than “amuse- 
ment where there is a crowd." Me would rather spend nights 
at home than away from home, and he enjoys reading a book 
rather than going to the movies. I'hc non-social individual 
reports that he has a “few intimate friends" rather than "many 
acquaintances.” He says he does not "win friends easily", 
nor does he “usually liven up the group on a dull day." lie 
“can write a concise, well-organized report." Such an indi- 
vidual “practically never tells jokes" and his “feelings are 
easily hurt.” 

The social individual is one who "tells jokes well." Me is 
also a person who is indiiferent in his choice between a "great 
variety of work” or a “similarity of work." 

The Allport-Vernon Theoretical Scale was designed to 
segregate individuals whose theoretical values in life are 
highest On the new Strong “Thcoreticar’ Scale, there arc a 
number of more heavily weighted items wdiich appear to dif¬ 
ferentiate the tJieorehcal individual from the iion-thcoi'clical 
type. For example, m indicating the occupations he would 
prefer, the former names as "likes” the following; "astion- 
omer”, “author of a technical book”, "chemist”, “inventm-", 
laboratoi 7 technician”, "marine engineer", "scienliiic research 
worker , statistician”, and “watchmaker." The theoretical 
person is fairly consistent in preferring such school subjects as 
calculus”, “geology”, “philosophy”, “physics", “physiology”, 
an zoology. The theoretical type of individual enjoys 
^museums and the "solving of mechanical puzzles", also 
doing research work.” In the section of the Strong Blank 
^ eyote to the order of preference of activities, he indicates 
iiJce tor the development of the theory of operation of a 

66 



MEASURING PERSONALITY TRAITS 


new machine and for the role of chairman of an educational 
committee. He would have liked to have been a ‘‘Luthci 
Burbank” oi a “Thomas Edison.” He likes “technical respon¬ 
sibility (head of a department of 25 people engaged in 
technical, research work)” as opposed to “supervisory re¬ 
sponsibility (head of a department of .100 people engaged in 
typical business operation)", and he would prefei “mental 
activity” to “physical activity.” In rating his own abilities 
and characteristics, the theoretical person checks in the allirma- 
tive that he has mechanical ingenuity. In the division of 
peciilarities of people, he dislikes “bolshevists.” 

A few of the items which were more heavily weighted to 
characterize the non-thcorctical individual are that he would 
like to be a “sales manager" and would have liked to have been 
"John Wanamaker, merchant" lie dislikes the subject of 
“chemistry." 

The economic man as described by Allport (0) is “chai- 
acteristlcally interested in what is useful. Based originally 
upon the satisfaction of bodily needs (self-preservation), the 
interest in utilities develops to embrace the practical alfairs of 
the business world. This type is thoroughly practical and con¬ 
forms well to the prevailing steicotype of the average Ameri¬ 
can business man.” 


On the “Economic” Scale of the Strnnij Facatiannl In- 
terest Blank, some of the more heavily weighted items which 
appeared to diffeientiate the economic man from the nan 
economic man indicate that he likes the occupation of “sales 
manager” and likes to “develop business systems." lie would 
have liked being “EIcnry Ford, manufacturer", “J. P. 
Morgan, financier”, or “John Wanamaker, merchant.” This 
“typical business man” is indifferent toward the pastime of 
“reading a book” as compared with that of “going to the 
movies.” He definitely dislikes the occupations of “artist" and 
“author of a novel”, and has the same attitude toward the 


subject of “literature.” Fie likewise records dislike for "absent- 


minded people”, “socialists”, and "bolshevists.” 


He would 


67 



EDUCATIONAL AND ESVCIIOLOClCAI, MI ASURKMEN r 


prefer “supervisory responsibility (head of a dcjiartnient of 
300 people engaged in typical business openitinn)" to "tech¬ 
nical responsibility." 

In contrast, the non-economic man iiulicatfs that he likes 


such occupations as "clergyman", “college professor", "maga. 
zine writer", “poet", “school teacher", “sculptor", and “social 


worker." In the same vein, the non-economic person likes the 


subject “art" and likes "poetry.” 


Me admires “Lulher Bur¬ 


bank, plant wizard”, “Charles Dana Cribson. arti.st", and 
“Booth Tarkington, author." He likes amusements such as 


"observing birds", "visiting ait galleries”, and “museums", 
and listening to “symphony concerts." 


In investigating the Bell AdjustmciU fnvnilnry, the scales 
for “Home Adjustment", "Health Adjustment", and “Dmo- 
tional Adjustment" on the Strong Blank showed relatively fc;\ 
weightings of two or more. This dearth of heavily weighted 
items and the large number of unit weight items in no apparent 
pattern, seenrs to indicate the inability of the Strong items to 
differentiate people scoiing high from those scoring low on 
the Bell Home, Health, and I*, motional Adjustment Scales. 
This^ fact is substantiated by the very low validity coelTicient 
obtained. 


Some of the more heavily weighted items which appeal on 
the “social adjustment” scale that differentiate the social from 
the non-social type of person show that the former chose an 
occupation such as “consul,” He liked subjects such as “dra¬ 
matics", “hteramre", and “public speaking." The socially 
adjusted person indicated a liking foi activities such as “inter¬ 
viewing clients", “opening a conversation with a stranger", 
ma ing a^speech , organizing a play", "meeting and direct- 
ing peop e , taking responsibility”, “meeting new situations", 
enteitainmg others." He stated that he liked “quick 
empered people and that he “tells jokes well." I le answered 
^ ^ ; “usually start activities of my 

“Luall’ I done", 

of myself'"''' ^uite sure 


68 



MEASUKING PERSONALITY TRAITS 


The non-social individual preferied to be a “member ol a 
society” rather than an "ofhcer m a society.” lie would rather 
“deal with things” than people. He would choose a “few 
intimate friends” rather than "many acquaintances.” He would 
rather “listen to a story” than “tell a story.” lie stated that 
he “worries considerably about mistakes”, his “feelings are 
easily hurt”, and his “advice is practically never asked.” The 
non-social person indicated a dislike for a “politician” and for 
“expressing judgments regardless of criticism.” 

An investigation of intelligence as measured by the Ameri¬ 
can Council on Education Psychological Plxamiaalion shows 
the following more heavily weighted items which appear to 
differentiate the person of high ntlcUigcitcc from the indi¬ 
vidual with a low intelligence rating. The former indicates a 
liking for the occupations of “author of a novel”, "college 
professor”, “editor”, and “magazine writer.” lie dislikes the 
occupations of “life insurance salesman” and “ofliee clerk.” 
The person of high intelligence indicates a preference for 
“symphony concerts” and “poetry." He would rather read a 
book than go to a movie. He chooses the “Atlantic Monthly" 
for reading and prefers the school subjects of “algebra”, 
“calculus”, “geometiy”, “literatuie”, and “philosophy,” He 
enjoys “arguments”, the “teaching of adults”, and admires 
“independents in politics.” He considers the most important 
factor affecting his work an "opportunity to make use of all 
knowledge and experience”, and he says that he “can write a 
concise, well-organized report.” 

The occupation “floorwalker”, the amusement “vaude¬ 
ville”, the subject “agriculture”, and the man "John Wana- 
maker, merchant” are items which the individual of low in¬ 
telligence rates as liked. He dislikes the occupations of 
“astronomer”, “author of a technical bock", “inventor”, and 
the subject of “sociology.” This dislike is also evidenced in 
the weighting of the items “Bolshevists", "writing personal 
letters”, and “Booth Tarkington, author.” The person with 
a low intelligence rating indicates that he would prefer to 

69 



EDUCATIONAL AND PSYCIIOLOC.TCAL MI AKfRKMI.NT 


“work for self In small business” than to “work in a large 
corporation with little chance of beenminjf president until age 
55”; also, that he likes "many acquaintances" rather than a 
“few intimate friends." 

Additional information tvas obtained on some of the new 
scales, A correlation was found between p;rade iioint averages 
for the first semester in school and the scores on the intelli¬ 
gence scale. The correlation was .42 (X - 79), d'he validity 
of this scale as a measure of scholastic aptitude is to be com¬ 


pared with a correlation of .39 between grade point averages 
and the scores on the American Council on I'iducation Psycho¬ 
logical Examtnalinn for the validating group (N — «S 1). 

A correlation was found between the scores made with the 
Group 11 key (composed of mathematicians, engineeis, cliem- 
ists, and physicists) (17) of the Istunuj Vacattomil Inletcsl 
Blank and the scores of the “theoretical” key. A correlation 
of .80 was obtained. This high correlation would seem to 
indicate that these two keys arc to a l.irge extent measuring 
approximately the same thing. 

In ordei to determine whether the weightings on this new 
“economic” scale were similar to the Strong Key for (iroiip 
VIII (accountant, office man, purchasing agent, hanker) (17) 
a correlation was found for the scores of 89 subjects' on the 
two keys, A correlation of .21 was obtained from these data, 
thus indicating that there is apparently little relationship be¬ 
tween these keys It would seem that the "economic” man is a 
different type of individual from the business man as rep- 
resented by the accountant, office man, purchasing agent, 
banker group The recent factor analysis study made by Fer¬ 
guson, Humphreys, and Strong (11) shows how eight of the 
rong scales (teacher, life insurance salesman, ceitified public 
accountant, office worker, physician, lawyer, Y. M. C, A. 

loaZr® '“lomic scale of this study may produce hLvier 
loadings and be an additional help for counseling in this a.ca. 


70 



MEASURING PERSONALITY TRAITS 


It seemed possible that a better estimation of the value of 
the “social adjustment” scale could be made by comparing re¬ 
sulting scores with social activity records, The 10 students 
of the validating group scoring highest on the new Strong 
“social adjustment” scale were compared with the ten students 
scoring lowest. A count was made of the number of activities 
in which these students participated during the semestci the 
tests were taken. The students composing the “social" group 
belonged to a total of seven organi/ations, while those making 
up the “non-social” group belonged to only two. 


Summary and Conclusion 

The puipose of this study was to investigate the possibility 
of using the Strong Votalional Interest Blank to measure cer¬ 
tain personality traits previously measurable only by (he use ol 
several other tests. 

An analysis of the results showed that all of the scores 
based on the new keys were positively correlated with the 
traits measured Validation coefficients based on a new gioup 
of subjects were: Bernreutcr Fl-C, .48; Bernreutcr F2-S, ..ii2: 
Allport-Vernon Theoretical, .48; Allport-Vernon I'iccmomic, 
.56; Bell Home Adjustment, 21; Bell 1 lealth Adjustment, .34; 
Bell Social Adjustment, .50; Bell Emotional Adjustment, .07; 
and the American Council on Education, Intelligence, .45, 'Fhe 
reliabilities of these keys ranged from .70 for the Health 
Adjustment key to .87 for the Social Adjustment key. 

In presenting the findings of this investigation, it should 
be remembered that the items of the Strong Vocational In¬ 
terest Blank were not designed to be used as elements of a 
personality test. However, the weighted items of the Strong 
Blank give a fairly good picture of some of the factors making 
up a particular trait, e.g., Sprangcr’s “economic” man (as' 
measured by the Allport-Vernon test) and the weighted items 
on the new Strong “economic” key. 

From the data gathered, the reliabilities indicate that the 
new Strong keys are fairly consistent in the material they are 

71 



EDUCATIONAL AND PSVT'IKILIKiK'AI MI AM Rl’.MINl' 


measuring. It also appears that some traits can he measured 
with more accuracy by the Strong Blank than other traits. It 
does not seem advisable to continue work in the fields of 
"home adjustment", “health adjustment", or "emotional 
adjustment" with the Strong rocaiional Inlcrcft Blank be¬ 
cause of low validities obtained in these fields. 

From this study, however, it does appe.ir that a prediction 
of self-confidence and of sociability can be made with a fair 
amount of accuracy with the Strong Blank, and that types of 
individuals such as “theoretical" and "economic" (business 
man) can be determined fairly well by using the Strong keys 
for measurement of these traits. "Intelligence" scores based 
on responses to the Strong Blank show as high a correlation 
with college success as the American Council on Mducation 
Psychological Examlnahon scores, d'his would indicate that in 
spite of relatively low validity coefficients when con elated with 
scores on the original tests, others of the new scales may be 
found to possess considerable practical validity in the evalua¬ 
tion of socially significant behavior. 


REFERENCES 

1. Allport, E. H, Social Psychology. New Yoik: Iluughuin Mifflin 
Company, 1924. Chap. V, VI, XIV. 

2. Allport, G. W. "A Test for Ascendancc-Submnision'', Joutnal of 
Abnormal and Social Psychology, XXIII, (1928), 11,8-130. 

3. Allport, G. W. "Concepts of Trait and Personality", Psychologi¬ 
cal Bulletin, XXIV, (1927), 284-293. 

4. Allport, G W. Personahiy. New Yoik: Henry Holt and Com¬ 
pany, 1937, pp. 236-237,303-304. 

5. Allport, G. W. and Vernon, P, E. Manual of Directions for a 
Study of Values. Chicago: Houghton Mifflin Company, (1931), 

6. Manual for Ameiican Council on Education Psycholof/icnl Exami¬ 
nation. Washington. American Council on Education. (1940). 

72 



MEASURING PERSONALri'Y TRAITS 


7. Bell, Hugh M. Manual fur The Aelptsimcnt Invcnliti\', Stanford 
University Press, California, (1934). 

8. Bernreuter, R G. Manual for The Persunahly hwcnlury, Stan¬ 
ford University Press, California, (1938), 

9. Chambers, O, R. “Charactei Tiait Tests .md Pmgnnsis of Col¬ 
lege Achievement", Joutnal of Abnonual and Social Piyi Iwloi/y, 

XX, (1925), 303-311. 

10. Conklin, E. S. "Tliice Diagnostic Scniings for the 'riiiiistone 
Personality Schedule''j Indiana Univeisity Pnhlic.itinns, SLiencc 
Series, No. 6, (1937). 

11. Ferguson, L. W, Humphreys, L, G., and Stiniig, F. W. ‘‘A Fac- 
toiial Analysis of Interests and Values", Jouinal of Kdiualiuniil 
Psychology. XXXII (1941), 197-20-1 

12. Kelley, T. L “The Scoring of Alternative Responses \\itli Refei- 
ence to Some Ciiteiion”, Jouiuat of Educational Psycholoi/y, XXV, 
(1934), 504-510 

13. Kelly, E. L, Teiman, L M., and Mile.s, C. C. "Ahilily to In¬ 
fluence One’s Score on a Typical Papei and Pencil Test of Per¬ 
sonality”, Character and Personality, IV, (lO.Ui), J()()-215, 

14. Kelly, E. L. “A Piclnninaiy Analysis of Psvchnlogical Factors in 
Assortative Mating”, Psychological Bulletin, XXXIV, ( 1937) 
749. 

15. Otis, A, S. Manual for Self-Administering Tests on .Mental 
Ability, Yonkeis-on-Hudson, New Yoik; World Book Company 
(1922). 

16. Steinmetz, H, C. “Measuiing Ability to Fake Occupational In- 
teiest”. Journal of Applied Psychology, XVI, (1932), 123-130. 

17 Strong, E. K., Ji. Manual foi Vocational Inlet csl Blank for Men, 
Stanford University Pi ess, California, (1940), 

18, Strong, E. K,, Ji. “Puicedure foi Scoring an Inteicst Tc.st", 
Psychological Clinic, XIX, (1930), 63-72. 

73 



ILDUCATIONAL A^’^ I’SYCIIOl.Odjr\I, Ml ASt'KI.MFXr 


19. Strong, E. K., Jr. "Intciciit Maturin'', Prisunnrl J>iiirniil, XII, 
(1933), 77-90. 

20. Strong, E. K., Jr. "Interests of Men anti \^'( 1 men", Jnumal oj 
Social Psychology, VII, (1936), 49-67. 

21. Young, C. W. and Estabiooks, G. II. “Reports on the Wuing- 
Estabiooks ‘Stucliousness Scale' for I'se witli Strong V'tjc.itional 
Interest Blank foi Men", Journal of hiliuulioual Psyilinlogy, 
XXVIII, (1937), 176-187. 


71 



A STUDY OF THU GENTRY VOCATIONAL 
INVENTORY 


CLIFFORD FROUIII.ICII 

State of Noitli Dakota Occupational Infiirmatioii and (luidaiu'c Service 

T he Vocational Invunloiy developed by Curtis G. Gentry, 
Director of Guidance and Secondaiy Education, Public 
Schools, Knoxville, Tennessee, contains 4d4 questions' 
Three hundred and eighty-four are in the Invcniory 
proper, and the remainder constitute a personality inven 
tory, Tht hi'uenloiy proper, according to Gentry’s stateineiit, 
"classifies the applicant’s strengths and weaknesses with refer 
ence to these eight major groups’’-; (1) social service, (2) 
literary work, (3) business, (4) law and government, (5) art, 
(6) mechanical designing, (7) mechanical constiuction, ami 
(8) science. Gentry states in the Manmd of Pirt'ciiiins that 
the Invenloiy was constructed after “a fruitless seaich for an 
instrument which would yield a general vocational over-view 
of pupils and young adults.’’- The Inventory began to take 
shape in 1921 and has undergone many revisions since that 
time, It was copyrighted and published in its piesent form 
in 1940. 

This study of the Vocational Inventory is based upon re¬ 
sults obtained by administering it to 815 seniors in the high 
schools of Cass County, North Dakota. A majority (72 per 
cent) of the students were from a single school located in the 
metropolitan area of the county. In all cases the test was 

1C. G. Gentry. Vocational hmntory, (Nashville! Eihic.-uiiinnl I’c^t BiitMii 
1940). 

2C, G. Gentry. Manual of Directions, Vocational Inventory, (NaHlivilli". 
Educational Test Bureau, 1941) 


75 



educational and PSYC’IIOLOCIICAL MI.ASI Rl.MKNT 

administered by peisons specially trained in tests and measure¬ 
ments and was given under favorable testing conditions. 

A statistical summary of scores of 1,000 Knoxville High 
School seniors is reported in the manual. Table 1 shows the 
total number of these students who received (he highest scores 
m each of the groups as well as the percentages as reported 
by Gentry, The possible range of scores is from /.cm to 180, 
The highest score made by each student indicates the field for 
which greatest interest and ability arc possessed. 

TABLE 1 


NUMBER OF HIGH SCORES BY CROUPS AS RTPORTH) HY t.ESTRVl* 


GROUP 

I 

II 

III 

' IV 

\’ 

VI 

VII 

vm ’ 

Total 

168 

135 

1+3 

18+ 

99 

23 

172 

76 

Percenl 

16.8 

13.5 

14.3 

18 + 

9 9 

2,3 

172 

7.6 


A similar tabulation was made of the 81.? students that 
were tested in Cass County. Table 2 gives the re.sults of this 
tabulation. Separate tabulations arc given by sexes. 

TABLE 2 


NUMBER 01 HIGH SCORP3 BY CROUPS, 815 CASS COIJN’IY MSilORS 


GROUP 

I 

11 

III 

IV 

V 

VI 

vn 

vm 

Girls Total 

60 

33 

41 

123 

60 

3 

0 

85 

Peicent 

1+8 

8.2 

10. 

30 4 

14 8 

0.7 

II 

21, 

Boys Total 

S 

5 

42 

42 

13 

S3 

179 

71 

Peicent 

1.2 

12 

10 3 

10.3 

3.2 

12.7 

43.7 

17,2 

Total 

65 

38 

83 

165 

73 

56 

179 

156 

Percent 

79 

+.6 

10.2 

20.2 

8,9 

6.9 

21,9 

19,4 


It is evident from a comparison of girls and boys in Table 
2 that there is a significant sex difference which must be taken 
into account in interpreting the results of this test although 
Gentry makes no statement in the Manual regarding sex dif¬ 
ferences. It is also interesting to note that when the scores of 
boys and girls are combined, there is still a wide discrepancy 
between the percentages as reported by Gentry and the per¬ 
centages as found in the present study. 

The question may be raised as to whether separate norms 
should be established for each sex in instances where sex dif¬ 
ferences are as pronounced as this study reveals. There is a 


^Ibid, p 8 


76 



A STUDY OF THE GENTKY VOCATIONAJ. INVENTORY 

considerable difference of opinion on this point. Those who 
maintain that combined norms are justified base their opinion 
on the fact that the type and number of jobs open to women 
diffeis materially from those open to men To them, estab¬ 
lishing separate norms picsumes that as many women as men 
would be advised to go into the field of mechanical constiuc- 
tion, for example, when it is evident that many more men 
than women should be so advised. I lowever, this jifisilion i.s 
not tenable for the writer. The establishment of separate 
norms would not nccessailly mean that as many women as men 
would be guided into the mechanical construction field, but it 
would emphasize to the users of the inventory that sex dil- 
ferences do exist and that they must be taken into account. 
No instrument of this type can in any way supplant the neces¬ 
sity for eveiy counselor’s having a knowledge of the job 
demands and opportunities for both men and women in all 
occupational categories. 

Question 383 of the Iiivruloiy asks the sludenl to “name 
three vocations which appeal to you at the present lime. Naiiu' 
your first choice first, if you have a first choice.”' The wriler 
noted that while using this instrument in a clinical siluiuion 
there seemed to be a large proportion of the students whose 
claimed first choice of an occupation was in the same grou|i 
as their highest score on the test. In order to check this 
relationship statistically, the student’s personal choice was 
classified into one of the eight groups according to the 
“Psychological Classification of Occupations” as given by 
Gentry. The contingency coefficient for student's first choice 
and the highest score on the Inventory for 338 female high 
school seniors is ,718. The C on a similar comparison for 300 
male high school seniors is .733. When these two groups are 
combined, C equals .751. 'Phis correlation raises the question 
of whether the inventory, In a large measure, may not .simply 
be a re-statement of the student’s personal choice. An exaini 
nation of the scattergrams for the correlations given above 

■'C, G GenOy Op. at. p. 27. 

77 



EDUCATIONAL AND I’SYt’HOJ.inm'AI. MI A,SfKl,MENT 


indiates that 51 per cent of the cases fall on the axis, indi- 
eating that 51 per cent of the high school senitvrs have a 
personal vocational choice that is in the (ickl of theii highest 
score, Darley” reports a maxiniiiiii contingency coefficient of .5 
between the claimed occupational interest types and measured 
occupational interest patterns for approximately 1,000 cases, 
with the Strong test, compared to .75 1 revealed by this study. 
Darley felt that he obtained a contingency coefficient as high 
as .5 only because he forced both the claimed and measured 
interest types into broad categories. 

This close agreement between the scores on the Inventory 
and the student’s stated choice may merely mean that he has, 
by a long and thorough process, come to such complete knowl¬ 
edge of occupational ccquirements that he reached the same 
conclusion that the Inventory revealed in a comparatively 
short time. However, the individuals upon whom this study 
is based were membeis of school population.s where no organ¬ 
ized effort had been made to provide occupational information, 
and only a few of the students had related work experience, 
Furthermore, the counselors who interviewed the .stutlents fol¬ 
lowing the administration of the Inventory found that in the 
majority of the cases the students had a very meager knowl¬ 
edge of the requirements of their stated vocational choices. 

In the Manual, Gentry states, “An analysis of the Inven¬ 
tory returns on this sampling indicates that the second highest 
score is usually earned in an occupational group related to the 
first.’'“ Tables 3, 4, and 5 show the relationship of the highest 
score to the second highest score. 

In some cases, Gentry’s statement seems to be correct. 
However, in many cases it is difficult to find justification lor 
his statement. For example. Group IV (Business), as shown 
in Table 5, indicates that SO people earned their second score 
in Group III (Law and Government); 22 in Group II (Lit- 

G. Dailey, Clinical Aspects and Inicrpielation n/ the Stronp I'amtional 
Intel esi Blank (New York The Psychological Coiporai'ion, 191-1), 

“C G, Gentry. Op. cit, p. 4 


78 



A STUDY OF THE GENTRY VOCATIONAL INVENTORY 


TABLE 3 

relationship of highest to second highest scori 
32S Iliff/i School Females 
Gioup in which student attained Inchest score 
Group I II HI IV V VI VH VIII Total 


Group 

I 

0 

8 

5 

17 

7 

1) 

0 

25 

62 

in which 

II 

9 

0 

5 

20 

9 

0 

(1 

2 

45 

itudent 

III 

6 

14 

0 

36 

10 

(I 

f) 

Id 

76 

attained 

IV 

12 

4 

12 

(1 

15 

0 

0 

21 

64* 

second 

V 

4 

3 


13 

0 

1 

0 

5 

28 

highest 

VI 

0 

0 

(1 

I 

7 

0 

0 

1 

9 

scoie 

VII 

0 

0 

0 

0 

0 

0 

0 

2 



VIII 

9 

3 

14 

16 

« 

0 

0 

0 

42 


TOTAL 

40 

32 

38 

103 

4S 

1 

0 

66 






TABLE 4 







RELATIONSHIP 

or 

IIICHCST 

TO 'JrcoND HiniirsT 

SCORI 






2S9 High School Males 






Group in which student attained highest score 




Group 

I 

11 

III 

IV 

V 

VI 

VII 

vin 

Total 

Gioup 

I 

0 

0 

1 

0 

0 

0 

0 

I 

2 

in which 

II 

0 

0 

2 

2 

0 

(1 

(1 

1 

5 

student 

III 

0 

2 

0 

14 

3 

2 

4 

7 

32 

attained 

IV 

0 

0 

10 

. 0 

1 

2 

13 

4 

30 

second 

V 

0 

2 

0 

0 

0 

1 

2 

2 

7 

highest 

VI 

] 

0 

0 

2 

3 

0 

59 

7 

72 

bcoie 

VII 

1 

0 

10 

12 

2 

36 

0 

28 

89 


VIII 

1 

0 

3 

6 

2 

6 

31 

0 

52 


TOTAL 

3 

4 

26 

36 

11 

47 

112 

51) 



TABLE 5 

RELATIONSHIP OF HICHESl’ TO SI'CONU IIIGflHST SCORE 

6I7 Males and Females Comhined 
Gioup in which student attained highest scoic 



Gioup 

I 

11 

III 

IV 

V 

VI 

VII 

VIII 

Total 

Gioup 

1 

0 

s' 

6 

17 

7 

" o” 

"o' 

■ '26' 

64 

in which 

II 

. 9 

0 

7 

22 

9 

0 

0 

3 

50 

student 

III 

6 

16 

0 

50 

13 

2 

4 

17 

lOS 

attained 

IV 

12 

4 

22 

0 

16 

2 

13 

25 

94 

second 

V 

4 

5 

2 

13 

0 

2 

2 

7 

35 

highest 

VI 

1 

0 

0 

3 

10 

0 

59 

8 

81 

score 

VII 

1 

0 

10 

12 

2 

36 

0 

30 

91 


VIII 

10 

3 

17 

_ 22 

2 

6 

34 

0_ 

94 


TOTAL 

43 

36 

64 

139 

59 ' 

48 

n2' 

116“ 



erary); 22 in Group VIII (Science) ; 17 in Group I (Social 
Service); 13 in Group V (Art); and 12 in Group VII 
(Mechanical Construction). It appears that in 64 per cent of 
the cases the second highest scores will fall in some other 
group besides Group III. This is certainly a wider distribution 
than is indicated by the statement “usually earned in an occu¬ 
pational group related to the first." 


79 




J'Drc'ATIONAr. ANI> I’SVt lloltn.H \t. MI \M'K! Ml XI 


The inference of Cientry’s slatcinciu. if ^iDuhl f.cem, is that 
there are faiily high iiUeicorrclafiotn atnong rcit.iin M'om if 
the second score is related to the lirst. Siu h over Japping, of 
course, is iindcsirahle from a standpoint itl measurement 
efficiency. The finding tliat .such relationship is Itnv therefore 
is a point in favor of the blank, even though it discredits 
Gentry's statement of close rclation.ship. 

In the Manual of lyin'riiniis, tientry a.ssuines that interest 
will drive the persorr to acquire certain rdijective information 
or proficiency in the field of iiuerest. According to FrvetV 
summary of the literature in the field, ihis so-called ohjective 
approach to the measurement of interest by the way of achieve¬ 
ment or proficiency has never been umisiially successful. 

According to the statement of CJentry, the test is designed 
to measure the student's “strciiglhs and weaknes.ses." As¬ 
suming that ability to deal with linguistic concept.s and achieve¬ 
ment in English are necessary reiiuisites for success in the 
literary field, the following study was made. 'I'he Literary 
Group scores of 176 high school senior girls, selected at 
random, were compared with the scorc.s made on the Goo/xto- 
live English Test Form O.M , and tire L-scorc of the Ameri¬ 
can Council on Education Esyclwlotjifal E.xaiiiiiiaiioti, 1938 
Edition. The correlation between the English achievement 
score and the L-score was .73, with a E.K. of .022. A correla¬ 
tion of .74 with a P.E. of .018 was obtained by Beyers** for 
these same measures on 500 college freshmen, ”^rhc correlation 
between English achievement and the Literary Key on the 
Gentry hvenpory was 589 with a P.E, of .033. The correla¬ 
tion was .60S, with a P.E, of .055 between the 1.,-scorc and the 
Literary Key. A partial correlation with the CJcntry score 
held constant was .58. With the L-score held constant the 
correlation between the Literary Key and Engli.sh achievement 

''Douglas Fryei, The MeasuremertI of Interest (.New York; Henry Holt 
and Company, 1931). 

SQtto Beyers, “Report of the Pteahmen Testing Program, N. D. A. C,, 
Fargo, North Dakota”, 1940. 


80 



A STUDY OF THE GENTRY VOCATIONAL INVENTORY 


was .095; with the English score held constant the correlation 
between the Literary Key and the L-score was .297. 

These correlations seem to indicate that, to some extent, 
the Gentry test is measuring some factor which is not meas¬ 
ured by the English and intelligence examinations. There is a 
question as to just what is being measured; possibly the high 
relationship between the personal choice and measured choice 
may offer one explanation A common factor of experience in 
an academic high school may be another factor. 

When intelligence is held constant, there seems to be little 
relationship between the English and the Gentry scores for the 
Literary Key. It is evident that the Literary Key gets at very 
little 1 elated to English achievement which is not already 
measured by an intelligence test Yet in The Student's Manual 
that accompanies the test, the author states concerning this 
group that “one entering an occupation in the above group 
should like literature and English, lie (or she) should be very 
much inteiested In composition, in writing articles, poems, 
themes, and reports."'’ 

In making the above analysis to test the relationship of a 
single key to known objective measures, the writer docs not 
contend that an interest inventoiy should show a marked posi¬ 
tive relationship with such measures. On the contrary, an 
interest inventory can hardly be justified if it adds nothing to 
the scores of other measures. Though the findings here do not 
entirely discredit the usefulness of the Vocational Inventory 
as a measure of interest in the literary area, it does tend to 
disprove Gentry’s claim that he is measuring “strengths and 
weaknesses.” 

While this study does not present conclusive evidence, the 
results at least indicate that this test should be more carefully 
standardized and evaluated before it is used in a counseling 
situation. The writer feels that the scores it now yields arc 
of questionable value to the average guidance worker, Rcfine- 

HC. G Gentry. Individual Analysis Repot t. (Nnshville' Educational Test 
Biiteaii, p 6, 1940). 


81 



EDUCATIONAI. AXD PSYCItfU.lK.It AI. MI XSI RI MI NT 


ment of procedures and further ftaupational staudaidi/ation 
and follow-up data are necessary lu-forc tin* wnmselor can 
regard this inventory as a valid diagnostic tfiol. 


82 



THE RELATIONSHIP OF TEIE AFFECTIVE I'OLER- 
ANCE INVENTORY TO OTHER 
PERSONALITY INVENTORIES 


ROBERT I, WATSON 
College of the City of New Yoik 


I N PRESENTING any new personality inventory it is 
important to establish the nature and extent of relation¬ 
ship that it bears to other scales designed to measure similar 
aspects of personality. The following pages present data 
on the relationship of the Watson-Fishcr Inve.nlor'j of Af¬ 
fective Tolerance to other standardiized measures.^ 

In constructing most inventories of emotional stability, 
Items successfully used by others were usually selected from 
the clinical literature and combined with additional items for 
preliminary tryout with a more or less pragmatic outlook. 
If they “hung together” on internal validation and differen¬ 
tiated between extreme gioups, a new inventory was con¬ 
structed. 

The Inventory of Affective Tolerance originated from a 
somewhat more theoretical framework. Items were collected 
with certain theoretical predilections as guiding principles 
Statements of "symptom” were included in the preliminary 
tryout only if they seemed to be appropriate to this trait 
of affective tolerance. As a result, many so-called conven¬ 
tional items were not included. A brief description of this 
point of view follows. 

An individual’s affective tolerance is judged to be his ca- 

^Tlus present lepoil ti conaiilered at. picliimiiaty to a ntudy by meain of 
some factoi technique. 


83 



educahonal Axn rsYriinuK.n-At. mi am bi mi xr 


pacity to handle his affective tensions: his capacity to adjust 
to affective disturbances. Aninnp the chief aspects t»f affective 
tolerance are the capacities to withstand or endure emotional 
tension, to vent or discharge emotional tension, and to govern 
or direct emotional tension. It is the aggregate (tf these that 
the Invcniory of Affective Tain am c purports to measure. 
A more complete discussion, including a description of the 
validation of the inventory, is given in an article by Fisher 
and Watson (3). 

The Relationship to the U'atson-J'ishci InvciUoiy of Attective 
Potency, to the jrHloiiffhby Personality Schedule, 
and to lire Penofc’Hn’r Personal Inventorv 

The subjects employed in this section of tile inquiry con¬ 
sisted of 55 boys and 97 girls, students at the University of 
Idaho, Southern Branch, who were described in connection 
with previous papers (3) (10). Besides the Inventory of 
Affective Tolerance, certain other mcasure.s inchuling meas¬ 
ures of general aptitude,''* the Watson-Fushcr hn-entorv of 
Affective Patency, the Willoughby Personality Schedule, and 
the Bernieuter Personality Inventory were administered to 
these students during the same semester. 

The Inventory af^ Affective Potency purports to meas¬ 
ure the trait of affective arousability, the strength and dura¬ 
tion of our everyday affective responses (10). The Wil¬ 
loughby Pcrsoiia/i(y Schedule is designed to measure "neurotic 
tendencies (11). The Bernreiitcr Personality Inventory is 
designed to measure several aspects of personality. In view 
of the analyses of Flanagan (4) and Lorge (8), attention 
wi 1 be given to but hvo of these measures, designated by 
tht symbols Fl-C and F2-S in the inventory manual (2). The 
ffrst IS a measure of confidence in oneself. Persons scoring 


aptitudeVwe'nSbf’J will- Rcneral 

and —02 /or SO boy™ and respcciivcly, —.0+ 

on Education Psycholotiical with the American Council 

03, and .oHor E Q and Tofu renpeetively, .0+, 

.06 for the guTs ' *0 boys, and .Or, .10. and 


84 



RELATIONSHIP OE THE AFFECTIVE TOLERANCE INVENTORY 


low tend to be self-confident and well adjusted, and individuals 
scoring high tend to have feelings of inferiority. The second 
of these measures is one of sociability. At one extreme indi¬ 
viduals tend to be non-social, at the other, gregarious. 

TABLE 1 

THE CORRELATION BETWEEN THE INVENTORY OV AEriiCTTVl, '101,1'RANCI'. 
AND THE WATSON-FISHCR INVENTORY OF AFFECTIVE I'O'IENCA', THE 
WILLOUGHBY PERSONALITY SCHEDULE, AND THE FIERNREUTER 
PERSONALITY INVENTORY 

Tolerance 

Boys Girls 



N 

1 

k 

N 

1 

k- 

Watson-Fishei 

55 

— 14 

.09 

97 

—.21 

.98 

Willoughby. 

,. 47 

— 70 

71 

94 

—.69 

.73 

Bernreutei Fl-C . ... 

.. 46 

—.66 

.75 

87 

—.56 

.83 

Bernreuter F2-S 

... 46 

— 13 

.90 

87 

—.11 

.99 


Table 1 gives the correlation between the scores on these 
inventories and scores on the tolerance Inventory together with 
the corresponding coefficients of alienation. The coirelations 
with the Willoughby and the Fl-C or confidence factor of the 
Bernreuter test are substantial, ranging between —.56 and 
—.70. The correlations considerably exceed the minimum 
demanded at the one per cent level of significance (7). The 
correlations with the Inventory of Jffeclive Potency and the 
sociability factor, F2-S, ol the Bernreuter test arc not sig¬ 
nificant at the one per cent level 

It would appear, then, that affective tolerance, as herein 
described and measured, bears considerable relation to self- 
confidence and lack of neurotic tendency, but little relation 
to sociability or affective potency. 

This substantial relationship, however, should not be inter¬ 
preted to mean that the tolerance inventory is so closely re¬ 
lated to these other scales as to be superfluous for the meas¬ 
urement of individual differences. Consideration of the co¬ 
efficients of alienation is pertinent in this connection. Reduc¬ 
tion of the standard error of estimate by one-half, as expressed 
by a k of .50, requires that r be .866. The smallest coeffi- 

8S 




EDUCATIONAL AND PSYCHOLOGICAL MKASURLMENT 

dent of alienation found here is .71. Garrett ( 5) states that 
“For r’s of 80 or less the coefficients of alienation are cleaily 
so large that predictions of individual scores based upon the 
regression equation are little better than a ‘guess’.” Although 
a substantial relation does exist between scores on the affective 
tolerance inventory and scores on these other inventories, it is 
evident that the Inventory of Affective Tolaance measures 
something other than whatever is measured by these two 
inventories. 

The Relationship to the Colgate Personal Inventory C 2 and 
the Bell Adjustment Inventory 

Fifty-nine white female student nurses, who were tested 
while receiving three months of their training at the Idaho 
State Mental Hospital (South) at Blackfoot, Idaho, took the 
Inventory of Affective Toleiance, the Colgate Personal Inven¬ 
tory Cj, and the Bell Adjustment Inventory.'’ 'Fhe background 
of these subjects is summarized in Table 2. 


TABLE 2 

DESCRIPTION OF THE OACKGROUND OF FIFTY-NINE S’rUDENT NUR.SES 



Mean 

fr 

Range 

Age . 

23.25 

2,56 

19-33 

School Year Completed. 

12.61 

1,09 

12-10 

Months of Training Completed. .. 

28.87 

4.50 

12-33 

Otis Self-Administering Test Scoies 

45.90 

9,18 

25-65 

Moss Nursing Aptitude Test Scores 

139.90 

30 02 

41-186 


The Colgate Personal Inventory C 2 is designed to measure 
traits of introveision-extroversion. (6) The Bell Adjustment 
Inventory measures home, health, social, emotional, and oc¬ 
cupational adjustment.^ The higher the score in this inven¬ 
tory, the more unsatisfactory the adjustment. 

The product-moment correlations between the Inventory 
of Afective Tolerance and the other inventories are presented 
in Table 3. All correlations are negative and all but two 
are significant at the one per cent level. (7) The correla¬ 
tion with the Colgate Personal Inventory C 2 is not quite 

wiaE to ackno'wledge the courtesy of Mr. Barney Bybec, then p.sycho- 
metrician at the Idaho State Mental Hospital, South, in supplying these scores. 

86 




RELATIONSHIP OF THE AFFECTIVE TOLERANCE INVENTORY 

TABLE 3 

THE CORRELATIONS BETWEEN THE INVENTORY OF AFFECTIVE ’lOLERANCE 
AND THE COLGATE PERSONAL INVENTORY C 2 AND 
THE BELL ADJUSTMENT INVENTORY 


Tolerance 


Measuie 

N 

r 

k 

Colgate C-2 . ... 

.. 59 

— .10 

.95 

Bell Home ... . 

.. 59 

—.40 

92 

Bell Health. 

.. 59 

—.11 

.99 

Bell Social . . 

, 59 

—.60 

.80 

Bell Emotional . 

. 59 

—.54 

.84 

Bell Vocational ... . . 

45 

— 50 

87 

Bell TotaB . . . 

. 59 

—.56 

.83 

lVoc.ilional items are omitted since not 

all sulijccts 

tnok tlu* ffjiiH 

of the 


Bell Inventory that includes this secfion. 

significant, and the correlation with the Bell Health Score is 
entirely negligible Health adjustment and introversion-extro¬ 
version are apparently not significantly correlated with af¬ 
fective tolerance The remaining correlations with Bell scores 
range from —.40 to — .60. Affective tolerance bears some 
relation to home, social, emotional, vocational, and total ad¬ 
justment as measured by the Bell Inventory. 

Although there is evidence that some degree of relation¬ 
ship exists, nevertheless the coefficients of alienation, which 
are also reported in Table 3, do not encourage the view 
that the two inventories can be used interchangeably. The 
smallest coefficient of alienation is .80, which implies 20 per 
cent efficiency in prediction. 

Commonality of Items 

An attempt was made to find out what items the Watson- 
Fisher inventory and the other personality measures had in 
common. Such an inquirty is pertinent in a field of investiga¬ 
tion marked by repetition of items from Inventory to inven¬ 
tory. Docs the approach from the theoretical point of view 
earlier expiessed result in the selection of the same items as 
the more pragmatic approach? 

Two degrees of similarity of content can be distinguished. 
The first category adopted included items in which the con- 

87 





EDUCATIONAL AND PSyCIIOLOfilCAL MEASLRI'.MI'.NI 


tent apparently did not differ in scope, e. g., item 4 of the 
tolerance blank, oi the buck^joitud et sociul ^alha^ 

ings”, as compared to item 118 of the Benireuter, ”J)a you 
keep in the background at social fimctionsf” In the second 
category weie items in which the similarity was one of a whole 
to a part or a part to a whole, c. g., item 60 of the tolerance 
inventory, "I control my feelings of grief nr sorrow", as com¬ 
pared to Item 35 of the Bell, "Are you easily moved to tears?" 
The two categories will be referred to as “similar” aiul “par¬ 
tially similar," respectively. 

The items of the tolerance inventory were checked against 
the four other personality measures used in one or anothei 
of the two samples studied for similar and partially similar 
Items. The results of this subjective, somewhat crude analysis 
are presented in Table 4. 


TABLE -1 

THE SIMILARITY OF CONTENT OF ITEMS OF C’FRTMN PERSON AMTY 
MEASURES TO THE 61 ITEMS CONTAINEO IN Till'' 



INVENTORY OF AFFECTIVF, TOLER ANCI. 





Simihu tn 



Paitiallv 

Paitially 

Measure 

Similar 

Similar 

Similai 


N % 

N % 

N % 


Bernreutei. 

. ... 16 

26 

2 

3 

18 

30 

Willoughby .., 

. 7 

11 

5 

8 

12 

20 

Laird C-2 .... 

. 6 

10 

4 

7 

10 

16 

Bell. 

.... 14 

23 

4 

7 

18 

.10 


It is apparent that similarity of content expressed in this 
fashion does occur. However, not more than 30 per cent of 
the items of the Inventory of Affective Tolerance appear in 
any one of these measures. 

The commonality of items can be expressed in another 
fashion, namely, the total number of the 61 tolerance items 
appearing in one or more of the other measuics as either a 
“similar" or a “partially similai" item. There were 27 such 
items. It is evident, then, that many of the items rcpoitecl 
as similar in the data of Table 4 were found in two or more 


88 



RELATIONSHIP OF THE AFFECTIVE TOLERANCE INVENI’ORY 


of the Other blanks. In all, 34 items or 56 per cent of the 
items m the tolerance inventory do not appear m the other 
four blanks. Apparently, then, there is no great similarity 
of items in the tolerance inventory and those contained in the 
other inventories studied. 

Conclusions 

1. Affective tolerance bears substantial lelationship to con¬ 
fidence in oneself, lack of neurotic tendency, and social and 
emotional stability as measured by the scales used 

2 Little or no relation is found between affective toler¬ 
ance and aftective potency, sociability, nr health adjustment as 
measured by the scales used. 

3. No relations are found to be of such a magnitude as to 
make the Inventory of Affccitvc Tolerance superlluous, since 
the smallest coefficient of alienation is ,71. 

4. More than half the items contained in the Invenloiy 
of Affective Tolerance do not appear in the other personality 
measuies employed. 

REFKRENCKS 

1 Bell, H, M. Manual foi the Ailjnsimeiii Inventory •. Adult Ftnrn. 
Stanford University: Stanford University Press, 4 pp. 

2. Bernreuter, R G. Manual for the Penonatily Invenlnry, Stan¬ 
ford Univei.sity Stanford University Press, 6 pp. 

3. Fisher, V. E. and Watson, R. I. “An Inventory of Affective Toler¬ 
ance”, Journal of Psycholo//y, XII (1941), 149-157. 

4. Flanagan, J. C. Factor Analysts in the Study of Personality. 
Stanford University: Stanford University Press, 1935 103 pp. 

5. Garrett, U. E. Statistics in Psychology and Education. New York: 
Longmans, 1937. 493 pp. 

6. Land, D. A. Genetal Injoimation and Directions for Using the 
Colgate Tests of Emotional Outlets. Hamilton: Ilaiiiilton Repub¬ 
lican, 4 pp. 

7. Lindquist, E. 1'. Statistical Analysis in Educational Research. 
Boston: Houghton-Mifflin, 1940. 266 pp. 


89 



EDUCATIONAL AND PSYCIIOLOCICAL MEASUREMENT 


8. Lorge, I. "Personality Tests by Fiat. I Tlie Analysis of the Total 
Trait Scores and Keys of the Bcrnreiiter Peisonality Inventoiy”, 
Journal of Educational Piychology, XXVI (1935), 273-278, 

9 Otis, A S. Manual of Directions and Key. Otis Eelf-Adminutei- 
ing Tests of Mental Ability. Yonkeis-on-PIiidson; VVoilcl Book, 
12 pp. 

10. Watson, R 1. and Fisher, V. E. “An Inventotv of Affective 
Potency”, Journal of Psychology, XII (1941), 139-148 

11. Willoughby, R, R. Directions' JFtlloughhy {(Unrk-Thurstone) 
Personality Schedule Piovulence: Autlioi, 2 pp. 


90 



THE INFLUENCE OF TRAINING ON MECHANICAL 
APTITUDE TEST SCORES 


RICHARD W. FAUIHON and liARLE A. CLEVELAND 
Air Coips Technical Tiainlng Cnrnrnaiid 

and 

THOMAS W. HARRELL 
Uiiiveisity of Illinnls 


T he present study is designed to investigate the influence 
of training on scores of the Mechanical Movements and 
Surface Development Tests (similar to many so-called apti¬ 
tude tests), which look highly susceptible to training. In 
previous studies Harrell and Faubion found the Mechanical 
Movements and Surface Development Tests to be valid in 
predicting course grades of student airplane mechanics The 
Surface Development Test correlated 55 with composite air¬ 
plane mechanics grades for 84 men (1), and .47 with a 
slightly different composite for 105 other men (2). h'or the 
same two groups the correlations of the Surface Development 
Test with mechanical drafting and blueprint reading were .54 
and .50, respectively. The Mechanical Movements Test cor¬ 
related .39 and .26 with composite grades in the same two 
groups. 

These tests developed by Thurstone are fairly familiar 
(3). The Mechanical Movements Test requires an Individual 
to figure out how machine parts, particularly gears and pul¬ 
leys, work, The Surface Development Test involves matching 
similar parts for drawings shown in two dimensions and in 
three-dimensional perspective. 

The procedure was to study two groups which were 
matched for mental test scores but which differed in the 


91 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


amount of mechanical training they had received. The two 
groups were each composed of 100 soldiers and w’ere matched 
on the basis of scoics on a mental test similar to the Ilenrnon- 
Nelson, One group consisted of Air Corps recruits who had 
not as yet been entered in any of the technical courses offered 
by the An Corps Technical Schools. The second group was 
composed of airplane mechanics students who had just fin¬ 
ished a six-week basic training couise in mechanical diafting 
and blueprint reading, elements of metalwork, elements of 
electricity, shop mathematics, and aii coips fundamentals. The 
mean raw score for the untiaincd recruits on the mental test 
was 57.0, as compared with a mean raw score of 57.3 for the 
trained group. The difference between the two means is only 
one-fourth the standard error of the difference and conse¬ 
quently is quite insignificant. The variability of the two groups 
is also similar, a standard deviation of 8 96 being found for 
the recruits as against a standard deviation of 8.97 for the 
students. 

The two groups were not Intentionally paired for previous 
mechanical experience or training, but were sclecteil at random 
from groups of Air Corps recruits and Air Corps Technical 
School students, respectively. 

Special attention will be given to the mechanical drafting 
and blueprint reading course, as well as the elements of metal¬ 
work course, because of their apparent similarity to the tests. 
Mechanical drafting and bluepiint reading, a forty-hour 
course, had the following outline: 

a. Fundamental principles of mechanical drafting. 

b. Exercises in orthographic projection. 

c. Development of sui faces. 

d. Blueprint reading. 

e. Exercises in blueprint reading. 

Elements of metalwork, a sixty-hour course, had the fol¬ 
lowing outline; 

a. Properties and uses of the common metals. 


92 



MECHANICAL APTITUDE TEST SCORES 


b. The care and use of the common tools needed m the 
repair and manufacture of small parts.. 

c. Metalwork—projects in drilling, filing, thread cutting, 
reaming, etc 

d. Soldering—soft and hard soldering. 

e. Brazing. 

The results given in Table 1 show no significant differences. 
The trained men had a score of 18.4 on the Surface Develop¬ 
ment Test as compared with 17.9 for the recruits This dif¬ 
ference is less than half the standard error of the diffeiencc. 


TABLE 1 

COMPARISON BETWEEN 100 RECRUITS AND 100 SOLDIERS TRMNEI) IN 
BASIC AIRPLANE MECHANICS 



for 

Mean for 
Tiaincil M(‘n 

Diffi n-nii’ 

Sluma (if 
Ihflru'mt* 

Mental Test . 

57.0 

57.3 

0 3 

1.2 

Surface Development . . . 

17 9 

18.4 

0.5 

0.9 

Mechanical Movements 

(No. Right) . 

32.2 

32.5 

0.3 

1 2 

Mechanical Movements 

(Rt. —W). 

18.2 

_ 17.6 

0.6 

1.7 


The Mechanical Movements Test scores were treated in 
two ways: (1) Number right and (2) Rights minus wrongs. 
Since the number of choices varies from one question to 
another, the use of a simple coirectlon formula would be 
questionable. The trained group had a mean score of .3 of a 
point higher for the number right. The recruits had a mean 
score 6 of a point higher when the correction formula was 
used. Neither of these differences is as large as half the 
standard error of the difference. 

These results Indicate that six weeks of intensive training 
in mechanical courses do not significantly increase mechanical 
aptitude test scores, even where the lest is very similar to the 
activities carried out in the training. This is strikingly true 
of the Surface Development Test, in which the items resemble 
mechanical drafting and blueprint reading work. No conclu¬ 
sion can be drawn as to how far this result can be generalized; 
possibly longer training or earlier training would show a 

93 





educational and psychological measurement 


significant increase The authors wish to point out, however, 
that the present results are contrary to statements often made 
about a mechanical aptitude test. 

REFERENCES 

1. Harrell, T. W, and Faubion, R. W. “Selecflon Tests for Aviation 
Mechanics", Journal of Consnlling Psvclwlo//y, IV, (1040), I 04 . 
105. 

2. - "Primaiy Mental Abilities and Avi.ition Maintenance 

Courses”, Educatto?tal and Psychologiral AJiunurernenl, I, (1941), 
59-66. 

3. Thuistone, L L. Pritnaiy Mental Abilities. P.sjclioinetiic Mono¬ 
graphs, No. 1. Chicago. Univeisity of Chicago Pie.ss, 1938. 128pp. 


94 



NEW TESTS'* 


American School Achievemenl Tests, by Robert V. Yoiiiig, 
Wilbs E. Pratt, and Frank Gatto. 1941. Forms A and B. 
Primary Battery I, for grade 1. Time, 35 minutes, $2.50 
per 100; 3c each; specimen set 25c. Primary Battery 11, 
for glades 2 and 3. Time, 85 minutes, $4.00 per 100; 5c 
each; specimen set 30c. Published by the Public School 
Publishing Company, 509-513 North Itast Street, Blooni- 
ington, Illinois. 

American School Reading Readiness Test, by Robert V. 
Young, Willis E. Pratt, and Carroll A. Whitmer For 
kindergarten and grade 1. Time, about 30 minutes. Form 
A, $4 00 per 100; 5c each; specimen set 25c. Published 
by the Public School Publishing Company, 509-5 13 North 
East Street, Bloomington, Illinois. 

Arithmetical Reasoning Test, by Alfred J, Cardall. 19'11. 
For academic and technical prediction. For 12Lh grade 
level and above. Time, 40 minutes, h'orms A and B, Sc 
each; specimen set 20c. Published by Science Research As¬ 
sociates, 1700 Prairie Avenue, Chicago, Illinois. 

Chicago Tests foi Primary Menial Abililies, by L. L Thurs- 
tone and Thelma Gwinn Thuistone, 1942 For ages 11 
to 17. Time, 40 minutes for each of six booklets, $5,00 
per 25 sets; $9 00 per SO sets; $15.00 per 100 sets; $70.00 
per 500 sets; specimen set $1.00; supplementary supplies 
additional. Published by the American Council on Educa¬ 
tion, 744 Jackson Place, Washington, D, C. 

College English Test, National Achievement Tests, by A. C. 
Jordan. 1941. For high school seniors and college fresh¬ 
men. Time, about 45 minutes Forms A and B, $2.50 per 
25; 100 or more copies 7*/2C each. Published by the Acorn 
Publishing Company, Rockville Centre, Long Island, New 
York. 


*Prepared by Jane Gilbert. 


95 



educational and psychological MKAStrULMIvNT 


JJetrott Alpha lntellige.}Hc Test, by Harry J. Baker. 1941 
For grades 4 to 1 Time, 32 mimites. J-orms S and T, 
.$3.50 per 100; 4c each; specimen set 15c Published by 
the Public School Publishing Company, 509-513 jNorth 
East Street, Bloomington, Illinois. 


English Minimum Essentials Test, by J. C. 'I lesslcr. Revised 
1941. For giades 8 to 12. Time, about 40 ininutes. Forms 
A, B, and C, 75c per 25; 4c each; specimen set 10c. Pub¬ 
lished by the Public School Publishing Company, 509-513 
North East Street, Bloomington, Illinois. 


Every-Day Life, by Lelaiul IT. Stott. 1941. To measure three 
factors in self-reliance. For high school students. Time, 
about 30 minutes Ilaml- or machine-scored. .$4.00 per 
100; $2.25 per 50; specimen set 15c: machine-scoring 
answci sheets, $2.00 per 100 up to 500. Published by the 
Sheridan Supply Company, P. O. Box S37, Be\'erly Hills, 
California. 


Fnrhay-Srhiamine! Social Coniprclicinion Test, by John H. 
Furbay and PI. E. Schrammel. 1941. I''or high school and 
college students, and adults d'mie, SO minutes. P'orm A, 
$1.70 pel 25 ; 7c each; specimen set 15c. Published hv the 
Bureau of Educational Measurements, Kansas State 
Teacheis College, Emporia, Kansas. 


Interest Inveniaiy foi Elementaly Grades, by Mitchell Dreese 
and Elizabeth Mooney. 1941. Foi grade.s 4, 5, and 6. 
Tirne, about 30 minutes Published by Center for Psycho¬ 
logical Service, George Wa.shington University, Washing¬ 
ton, D. C. 


Inventory of Social Behavior, by Ellis Weit/.nian. 1941. P'or 
ages 16 to 25. Time, about 20 minutes. $4.00 per 100; 
$2 25 per 50; specimen set 15c. Published by the Sheridan 
Supply Company, P. O. Box 837, Beverly I lills, California. 


Iowa Every-Pupll Tests of Basic Skills—Lovm M, by H. F. 
Spit^r, Ernest Pioin, Maude McBrootn, H. A. Greene, 
and E, F. Lindquist. 1941. Test A, Silent Reading Com¬ 
prehension; Test B, Work-Study Skills; 'Pest C, Basic 

96 



NEW TESTS 


Language Skills; Test D, Basic Arithmetic Skills. Ele¬ 
mentary Battery for grades 3 to S. Time, about 8S min¬ 
utes. $1.15 per 25 for each test; $3.75 per 25 for com¬ 
plete battery. Advanced Battery for grades 5 to 8. $1.25 
per 25 for each test; $4.00 per 25 for complete battery. 
Published by Houghton Mifflin Company, 2 Paik Street, 
Boston, Massachusetts. 


Iowa Placement Examinations, English Tiaining —Form M, 
constructed by M. F. Carpenter, G D. Stoddard, and L. 
W Miller; revised by M. F. Carpenter and D B. Stuit. 
Revised 1941. For college students. Time, 45 minutes. 
Hand- and machine-scored. $4.00 per 100; machine¬ 
scoring answer sheets Wuc each; specimen set 20c. Pub¬ 
lished by the Bureau of Educational Research and Service, 
State University of Iowa, Iowa City, Iowa. 


Iowa Placement Examinations, Foreign Language Aptitude — 
Form M, constructed by G. D. Stoddard; revised by Grace 
Cochran, J. R. Nielson, and D. B. Stuit. Revised 1941. 
For college students. Time, 45 minutes. Flaiul- and 
machine-scored. $4.00 per 100; machine-scoring answer 
sheets 2c; specimen set 20c. Published by the Bureau of 
Educational Research and Service, State Univcisity of 
Iowa, Iowa City, Iowa 


Iowa Placement Examinations, Physics Aptitude —Form M, 
constructed by G. D. Stoddard and C. J. Lapp; revised by 
C. J. Lapp and D. B. Stuit. Revised 1941. For college 
students. Time, 50 minutes. Fland- and machine-scored. 
$4.00 per 100; machine-scoring answer sheets U/ic each; 
specimen set 20c. Published by the Bureau of Educational 
Research and Service, State University of Iowa, Iowa 
City, Iowa. 


Kansas Spelling Test, by H. E Schrammcl, 0. M. Rasmussen, 
and Wayne Gordon. 1941. Test I for grades 1 to 3 ; Test 
II for grades 4 to 6; Test III for grades 7 to 9. Time, 
15 minutes Imrms A and B, 50c per 25 ; specimen set 15c. 
Published by the Bureau of Educational Measurements, 
Kansas State Teachers College, Emporia, Kansas. 

97 



EDUCATIONAL AND PSYCIIOLOCJICAL MEASUREMENT 


Language Essenliah 'Tests, by Vera Davis and II. E Schram- 
mel. 1941. For grades 4 to 8. Time, 30 minutes. Forms 
A and B, $1.00 per 25; $4.00 per 100; $36.00 per 1000; 
specimen set 25c. Published by the Educational Test Bu¬ 
reau, Minneapolis, Minnesota. 


McDougal General Science Test, by Clyde R. McDougal. 
1941. For high school students. T’imc, 40 minutes each 
for Test I and Test II. 50c per 25; 2Y>c each; specimen 
set 15c. Published by the Bureau of Educational Measure¬ 
ments, Kansas State Teachers College, Emporia, Kansas. 


Primary Business Interests, by Alfred J. Cardall. 1941. For 
high school, college and adult levels. 7'ime, about 20 min¬ 
utes. Hand- or machine-scored. 5c each; scoring keys 25c; 
specimen set 35c. Published by Science Research Asso¬ 
ciates, 1700 Prairie Avenue, Chicago, Illinois. 


Primary Reading Tests, by Albert G. Rellley. 1941, For 
grade 1. Form B, 85c per 25, Published by Houghton 
Mifflin Company, 2 Park Street, Boston, Massachusetts. 


Recreation Inquiry, by Richard Wilkinson and Sidney L, 
Pressey. For high school and college students. Time, 
about 50 minutes. $1,00 per 25; $3.00 per 100; specimen 
set 15c. Published by the Psychological Corporation, 522 
Fifth Avenue, New York City. 


Wtksell-Filkin Libiary Instructional 'Tests, by Wesley Wiksell 
and Mary Fllkin. 1941. P'or high school and college stu¬ 
dents. 25 separate tests each on a different subject. $3.75 
per 25 complete batteries. Published by the Acorn Pub¬ 
lishing Company, Rockville Centre, Long Island, New 
York. 


IVtlson Scales of Stability and of Instability, by Matthew H 
Wilson. 1941. For junior and senior high school and 
college students, and adults. Time, 20 to 30 minutes. 
$1.15 per 25; 5c each; specimen set 15c Published by the 
Bureau of Educational Measurements, Kansas State 
Teachers College, Emporia, Kansas. 

98 



MEASUREMENT ABSTRACTS* 


Baxter, Brent. *‘An Experimental Analysis of the Contribu¬ 
tions of Speed and Level in an Intelligence Test.” Journal 
of Educational Psychology, XXXII (1941), 285-96 

Three measurements were taken of performance on an 
intelligence test: speed, the time to complete the entire test; 
power, the number of items correct at the end of a given time; 
and level, the number of items correct with unlimited time. 
Speed and level are uncorrelated. “Speed and level contribute 
the entire variance of power.” Level is more important than 
speed in determining college grades. When the students are 
tested individually, prediction (of Army Alpha scores, of col¬ 
lege aptitude test scores, and of college grades) “through the 
combination of speed and level in multiple correlation is greater 
than that possible” with the more usual scores of power. When 
the students are tested in groups, this superiority vanishes. 
H M. Wolfle. 


Carroll, John B. “A Factor Analysis of Verbal Abilities.” 

Psychometnka, VI (1941), 279-308, 

A multiple-factor analysis was made of a battery of 42 
tests of verbal abilities administered to 119 college adults. 
Where necessary, the distributions of test scores were nor¬ 
malized before the inter-test correlations were computed. 
Thurstone's M (Memory or Rote Learning) factor has been 
confirmed, but his V (Verbal Relations) factor seems to have 
been split into two or possibly three factors, C, J, and G. ITis 
W (Word Fluency) factor has been split into two factors, A 
and E. The C factor seems to represent the richness of the 
individual’s stock of linguistic responses, and the J factor 
seems to involve the ability to handle semantic relationships. 
No satisfactory interpretation can as yet be made of the G 
factor. The A factor seems to correspond to the speed of asso¬ 
ciation for common words where there is a high degree of 
restriction as to appropriate responses. The E factor is de¬ 
scry as an assoclational facility with verbal material where 

•Edited by Professor Forrest A. Kingsbury. 

99 



EDUCATIONAL AND I’SYCIIOI.OOICAI. MEASUREMENE 

the only restriction is that the responses must be syntactically 
coherent. The new factors are: F, facility and lluency in oral 
speech; H, facility in attaching appropriate names or symbols 
to stimuli; and D, speed of articulatory mcjvemcnts. (Courtesy 
Psychometrika.) __ 

Chapanis, Alphonse. "Notes on the Rapid Calculation of 
Item Validities." Journal of Kducational Psychology, 
XXXII (1941), 297-304. 

Several shortcuts in the estimation of item validities by 
means of the biserlal correlation coefficient arc suggested, 
When one is not interested in inter-test comparisons, the con¬ 
stant '^may be eliminated. By employing the same class in¬ 
terval and assumed mean i may be omitted and the means in 
the formula replaced by deviations of guessed means. More¬ 
over, if the range of abilities tested is homogeneous, z is un¬ 
necessary. Formulae facilitating transformation of reduced 
coefficients are presented in case intcr-test comparisons are 
later desired. F. Brown. 

Crissy, William J. E. "A Reply to an Examinee's Reactions to 
the National Teacher Examinations.” Journal of IJtgher 
Education, XII (1941), 484-487. 

This article takes each of the particular criticisms in turn 
and indicates how each problem of test construction was 
handled in compiling the National Teacher Examinations The 
advantages of the particular test form used are listed and 
year-to-year comparability of test scores is claimed. The con¬ 
struction of test items and scoring key is briefly outlined. The 
purpose of the examinations and the use of the test results are 
considered and precautions taken in this area are pointed out. 
D. A. Peterson. 

Ferguson, George A. “The Factorial Interpretation of Test 
Difficulty.” Psychometrika, Yl (1941), 323-30. 

This paper discusses the influence of test difliciilty on the 
correlation between test items and between tests. The greater 
the difference in difficulty between two test items or between 
two tests, the smaller the maximum correlation between them. 
In general,^ the greater the number of degrees of difficulty 
among the items in a test or among the tests in a battery, the 

100 



MEASUREMENT ABSTRACTS 


higher the rank of the matrix of intercorrelations; that is, 
differences in difficulty are represented in the factorial con¬ 
figuration as additional factors. The author suggests that if 
all tests included in a battery are roughly homogenous with 
respect to difficulty, existing hierarchies will be more clearly 
defined and psychological interpretation will be more meaning¬ 
ful. (Courtesy Psychometnka.) 

Flanagan, John C. "A Preliminary Study of the Validity of 
the 1940 Edition of the National Teacher Examinations." 
School and Society, LIV (1941), 59-64. 

Because conclusive validation is not yet possible, this re¬ 
port aims to do no more than to review evidence now available 
respecting the 1940 National Teacher Examinations, Two 
meanings of “validity” are distinguished: (1) Do the tests 
satisfactorily get at the content and mental processes indicated 
in the outline and specifications by which they were planned? 
(2) Do they aid in distinguishing between better and poorer 
teachers as measured by any of numerous criteria of teacher 
excellence? The former is partially answered by the agree¬ 
ment of the 10 or 12 cooperating experts who critically ex¬ 
amined the tests and their specifications, and by intercorrcla- 
tions between tests of the battery; the latter, by citation of 
several lines of evidence. One line is student ratings on 49 
teachers in 22 systems (at least 2 in each system) who had 
taken these tests and were chosen so as to reveal considerable 
spread of scores on the “common examinations." Correlation 
between ratings and test-scores was .51. Supervisors' ratings 
on these teachers on a number of items are also cited. The 
five highest of these correlated around .50 with test-scores. 
In general, the study shows that the examinations, have some 
predictive value as to the teacher’s general effectiveness and 
desirability, and also points out other significant items. F. A. 
Kingsbury, 

Froehlich, Gustav J. “A Simple Index of Test Reliability." 
Journal of Educational Psychology, XXXII (1941), 381- 
85 

A simple adaptation of the Kuder-Richardson index of test 
reliability is described, namely: 

a~n~M ( n — M ) 

oM « — 1) 

101 



EDUCATIONAL AND PSVCTIOLOOICAL MKASUKEMENT 

Since this formula involves only the number of items in the 
test, the mean of the test scores, and their standard deviation, 
it is offered as an index of test reliability easily applied by 
teachers and others who are limited with respect to time and 
statistical background. An empirical check, using the Wis¬ 
consin Achievement Test on some 2000 individuals, shows 
that reliability coefficients on the total battery and five parts, 
as computed by this formula, run slightly lower (.017 to .058) 
than Spearman-Brown r’s, the two rank orders of the six r's 
being identical, F. A. Kingsbwy. 


Gritten, Frances and Johnson, Donald M. “Individual Dif¬ 
ferences in Judging Multiple-Choice Questions.” Journal 
of Educalional Psychology, XXXII (1941), 423-430. 

Form A of the Nelson-Denny Vocohulary Test was given 
with instructions not to guess; form B with instructions to 
answer all questions and to rate each judgment on a confidence 
scale. Four different achievement scores from Form A were 
correlated with the confidence and achievement scores from 
Form B. The results indicated that with instructions not to 
guess, the more confident subjects will attempt and correctly 

IF 

answer more items, and that the conventional formula, R —— 

«-r 

could properly be called a correction for individual differences 
in confidence V. Brown, 


Haggerty, Lida Harmer. “An Empirical Evaluation of the 
Accomplishment Quotient A Four Year Study at the 
Junior High School Level.” Journal of Experimental Edu¬ 
cation,X (1941),78-90. 

“The AQ is a distinctly unreliable measure,” This conclu¬ 
sion was reached after studying data for 163 subjects over a 
four-year period. Intelligence was measured eight times, using 
four different tests, and achievement four times, combining 
Forms V and W of the New Stanford Achievement Test. 
‘‘There is a mean inter-r of only .35 for the eight AQ distribu¬ 
tions,” in spite of duplication among the measures. “Large 
numbers of pupils (seem to) achieve up to capacity or fall 
below capacity without the slightest change in their actual 
work by merely changing from one accepted test to another in 
measuring intelligence.” 


102 



MEASUREMENT ABSTRACTS 


A composite of all achievement scores correlates 94 with 
a composite of all intelligence scores H. M. Woljlc. 


Hartmann, George W. “A Critique of the Common Method 
of Estimating Vocabulary Size, Together with Some Data 
on the Absolute Word Knowledge of Educated Adults ” 
Journal of Educational Psychology, XXXII (1941), 351- 
358. 

Vocabulary estimates based upon samples of varying size 
drawn from the same dictionary were found to be fairly stable, 
although results indicated that less than fifty words do not 
yield an accurate measure. When samples chosen from dic¬ 
tionaries of varying size were compared, vocabulary estimates 
were discovered to be dependent upon the size of the dic¬ 
tionary Commonly accepted estimates need upward revision, 
for the present study demonstrated that the recognition vocab¬ 
ulary of the average undergraduate is in excess of 200,000 
words V. Brown. 


Horst, Paul, and collaborators. The Predulion of Peisonal 
Adjustment. New York: Social Science Research Council, 
pp. xii+455. 1941. 

This monograph is a study of the logic and methodology 
of the prediction of personal adjustment, prepared under the 
supervision of the Committee on Social Adjustment of the 
Social Science Research Council. It is oriented primarily with 
studies in prediction of adjustments in four fields, namely, 
school success, vocation, marriage, and crime, with a supple¬ 
mentary memorandum on problems of prediction in the na¬ 
tional defense program. Five supplementary studies by collab¬ 
orators are included dealing with case-study techniques, mathe¬ 
matical and tabulation techniques, reduction of number of 
variables (factor grouping), combining and weighting meas¬ 
ures, and five mathematical problems, In the systematic sec¬ 
tion, detailed descriptions of tests and other instruments are 
omitted, and the major methodological aspects of the predic¬ 
tion problems are summarized and analyzed in order. A 
chapter is devoted to suggestions for research projects in the 
prediction of individual behavior. F A. Kingsbury, 

103 



educational and I’SYCntJLOCiK’AI. MLASITRUMLNr 

Jackson, K. W. B. "Some Difficulties in the Application of the 
Analysis of Covariance Method to Kducatjonal Prohlcnis." 
Journal of litliicaiioual Psychology, XXXII (1941), 414- 
422. 

Since the analysis of covariance is a statistical method 
developed mainly for use in another llelil, it sceins inadvisable 
to apply it without question to educational problems. Although 
the method has been found very useful, it may be necessary to 
modify it slightly, Examples are given to illustrate some of 
the difficulties likely to be encountered in the adoption of this 
method and to demonstrate their possible solutions. V. Brown. 


Langsam, Rosalind Streep. “A Factorial Analysis of Reading 
Ability." Journal of E.'tpcrimcntal I'ldiiralion, X (1941), 
57-63. 

The factors involved in reading ability were determined 
using Thurstone's centroid method with lotation of axes. 
Twenty different siibtests from reading and intelligence tests 
and one from the Primary Menial Abiltlics battery were 
analyzed. Three of the four factors found in reading^ tests 
were identified as being similar in character to three of ihurs- 
tone’s primary abilities: a verbal factor V, "an ability to deal 
with verbal material”; a perceptual factor P, which in this 
material shows up as speed in "perceiving anil selecting the 
correct word from other words offered as possible answers ; 
and a word factor W, "a fluency in dealing with wmrds.^ A 
more tentative factor was "that of seeing relationships. H, 
M. Woljle, 


Lennon, Roger T. "Note on Line of Relation Method of 
Establishing Age or Grade Norms." Journal of Educa¬ 
tional Psychology, XXXII (1941), 389-90. 

Two methods of establishing age or grade norms arc: (1) 
to find mean scores for successive age or grade groups and 
pass a norm line through them; (2) to determine empirically 
the correspondence of scores on the new test with those on a 
test whose "line of relation” has already been established and 
interpolate norms on such a line of relation. The author points 
out the condition under which the two methods yield Identical 
results; namely, when the correlation of scores on each of the 

104 



MEASUREMENT ABSTRACTS 


two tests with age Is the same. The correspondence method is 
applicable only when this condition is known to be satisfied. 
F. A. Kingsbury. 

McCormick, Thomas Carson. EJemcnlary Social Statistics, 

New York: McGraw-Hill Book Company pp. 353. 1941 

This elementary statistics textbook is designed primarily 
for students and workers in sociology rather than in psy¬ 
chology and education. The first part of the book deals with 
the nature and control of statistical inquiry. The second part 
is devoted to common statistical procedures: tabulation of 
distributions, graphs, measures of deviation, correlation tech¬ 
niques, sampling and sampling errors, the significance of dif¬ 
ferences, and analysis of time series. Statistical tables are also 
included at the end of the book. Jane Gilbert. 


McNamara, John Joseph. “A New Method for Testing Ad¬ 
vertising Effectiveness Through Eye Movement Photog¬ 
raphy." Psychological Record, VJ (1941), 399-460. 

In order to test advertising effectiveness one group of 
readers was asked to leaf through a magazine containing 
advertising matter. Their eye-movements were photographed 
with the Purdue Eye-Camera and the mean time spent on each 
part of an advertisement was recorded. The magazine was 
advance copy so the readers had no opportunity to see the 
copy prior to the experiment. Another group was given ad¬ 
vertisements which had been cut up into parts and pasted on 
cardboard in heterogenous order. The length of time the 
reader required to identify the parts with the whole advertise¬ 
ment was recorded The probability that the reader would 
look at the advertisement long enough to identify the adver¬ 
tiser was also computed. The reliability of these techniques 
was high. The correlation between mean time and probability 
scores was .48 for combined groups. The effect of magazine 
position, position on the spread, and cartoons on advertising 
effectiveness was also determined. Jane Gilbert. 

Peters, Charles C. and Van Voorhis, Walter R. Statistical 
Procedures and Their Mathematical Bases. New York: 
McGraw-Hill Book Company, pp. 516. 1941. 

_ This textbook Is designed to explain the mathematical 
origins of statistical formulae in terms that can be understood 

105 



EDUCATIONAL AND PSYCHOLOGICAL MEASURIvMKNT 


by statistical workers having little mathematical background. 
Many of the Fisher techniques are discussed in relation to 
theoretical statistics, although the limitations of these tech¬ 
niques as applied in the psychological and social sciences are 
also indicated. The authors present a brief discussion of cal¬ 
culus and elementary statistical procedures, measures of central 
tendency, variability, reliability, probability, multiple-factor 
analysis, curve fitting, partial and multiple correlation, the 
nature of Chi squared, and the techniques used in controlled 
experimentation, Jane Gilbert. 


Remmers, H. H. and House, J. Milton. “Reliability of Mul¬ 
tiple-choice Measuring Instruments as a Function of the 
Spearman-Brown Prophecy Formula, IV “ Journal of 
Educational Psychology, XXXU (1941), 372-76. 

The hypothesis tested is that the relation between changes 
in reliability of multiple-choice test-items and changes in num¬ 
ber of response alternatives per test-item is predictable by the 
Spearman-Brown formula. A 60-item, five-alternative, mul¬ 
tiple-choice arithmetic test was given to 771 junior high school 
pupils Three derivative forms were constructed from this, 
having respectively four, three, and two alternative answeis 
per item, the eliminated answers having been selected by lot. 
Four groups, equated as to I.Q., took separate forms of the 
test. Reliability coefficients of half-test (odd-even) and whole- 
test both showed regular deciease for the four forms with 
decreasing number of alternatives, thus supporting the 
hypothesis within this range of two to five alternative re¬ 
sponses, F. A. Kingsbury. 


Remmers, H. H. and Sageser, H. W. “Reliability of Mul¬ 
tiple-choice Measuring Instruments as a Function of the 
Spearman-Brown Formula, V.” Journal of Educational 
Psychology, XXXII (1941), 445-51. 

The hypothesis tested is the same as that desetibed in the 
Remmeis and House article, abstracted above. Two equiv¬ 
alent attitude-scales (Remmers & Bues) of 37 items were 
combined and used in testing attitudes of agreement or dis¬ 
agreement on each of two college practices, Four derivative 
sets were prepared, providing respectively two, three, five, and 

106 



MEASUREMENT ABSTRACTS 


seven degrees of choice. From 87 to 112 univeisity students 
■filled out each of these forms Each of the two tests was 
scored twice, once with equal values for all statements, once 
with items weighted by scale-value. Dividing each of the four 
sets of scores into the two original 37-item scales, four corre¬ 
lations were found for each set of papers Corrected foi 
skewness by being transformed into “z” functions, the ob¬ 
tained reliabilities with unweighted scores did not support the 
hypothesis; but when weighted in teims of the experimentally 
determined scale values of the scale-items, the data supported 
the hypothesis. F. A. Kingsbury 


Ruch, Floyd L. “The Problem of Measuring Morale." Joui- 

nal of Educational Sociology, IV (1941), 221-228. 

At the present time one of the more effective tools de¬ 
veloped to give an orderly description of public response is 
the opinion poll. The two basic problems in public opinion 
polling are: to get a representative sample, and to get the 
desired information from every case in the sample. The 
problem of sampling is discussed and references are given for 
specific techniques in the field In the second problem basic 
types of defects are listed which lower the dependability and 
accuracy of questions. In conclusion possible additions to the 
public opinion poll technique in the measurenient of morale are 
discussed. Z). A. Peterson. 


Satterthwaite, Franklin E. “Synthesis of Variance." Psycho- 

metnka, VI (1941), 309-16 

The distribution of a linear combination of two statistics 
distributed as is Chi-square is studied. The degree of approxi¬ 
mation involved in assuming a Chi-square distribution is illus¬ 
trated for several representative cases. It is concluded that 
the approximation is sufficiently accurate to use in many prac¬ 
tical applications. Illustrations are given of its use in extend¬ 
ing the Chi-square, the Student “t" and the Fisher “z” tests 
to a wider range of problems. (Courtesy Psychomcirika.) 


Spache, George. “Deriving Comprehension, Rate and Ac¬ 
curacy of Reading Norms for a Short form of the 

107 



educational and I’SVCHOI.OCICAL mkastjrement 


Metropolitan Achieveinent Kcadiiijf I'est." Jouiiial of 
Educational Psychology, XXXII (1941), 359-64. 

The Metropolitan Achievement Tests, although widely 
used, require a long testing time (about 2Ti. hours for Inter¬ 
mediate Partial). The uses to which the tc.st i.s put indicate 
that abbreviation of certain tests is preferable to omission of 
sub-tests, if the time has to be shortened. 7'he method used in 
abbreviating the Reading Test is described, and correlation 
coefficients between total grade-score and shortened test raw- 
scores are cited. Grade-score norms for the short form are 
given, derived from the regreiision equations, and also per¬ 
centile norms for reading accuracy, the latter based on private 
schools and of uncertain validity for public .schools. Validity 
coefficients for the shortened form arc found to be almost as 
high as the reliability coefficients of the long form. F. A. 
Kingsbury. 


Thornton, G. R. “The Use of Tests of Persistence in the 
Prediction of Scholastic Achievement." Journal of Educa¬ 
tional Psychology, XXXll (1941), 266-274. 

A factor analysis of persistence tests revealed two unre¬ 
lated factors; one appeared in the shock and prcssiu’c tests, the 
other in the word building and perceptual ability tests. The 
hypothesis that personality tests have value for predicting 
scholastic success in proportion to the degree of similarity be¬ 
tween tests and classroom situations is suggested to explain 
the lower correlation between achievement and persistence 
found in this investigation. A formula utilixing scholastic 
efficiency and aspiration displayed in a previous school is pre¬ 
sented for prediction of grades in a new school. F. Brown. 


Travers, L. B. “Improving Practical Tests." Personnel Jour¬ 
nal, XX (1941), 129-33. 

This article discusses the advantages of evaluating the 
personal characteristics of testees while they are undergoing a 
practical test, rather than while interviewing them. The ratings 
under actual working conditions are claimed to be more re- 
’ble. Sample charts are given for a carpenter’s practical test 
and the rating sheet used with it. H. M. Woljlc. 

108 



MEASUREMENT ABSTRACTS 


Traxler, Arthur E. “A Study of the Junior Scholastic Apti¬ 
tude Test.” Journal of Educational Research, XXXV 
(1941), 16-27. 

The Junior Scholastic Aptitude Test consists of thiee 
parts: verbal section, containing five subtests; numerical sec¬ 
tion, containing three subtests; and an experimental section, 
not included in pupil’s score. A practice booklet is issued 
several days prior to the examination for the student to work 
through at his leisure. The administration of the test proper 
is on a secret basis. Results are reported to the school in terms 
of a derived score. The reliability, validity, and prognostic 
value of the test are discussed as indicated by correlations with 
other tests of academic aptitude, achievement tests, and school 
marks. “The data are not extensive enough to be conclusive, 
but it Is hoped that they will be of some assistance in apprais¬ 
ing and using this new test ” D. A. Peterson. 


Wherry, Robert J. “An Extension of the Doolittle Method 
to Simple Regression Problems.” Jomnal of Educational 
Psychology, XXXTI (1941), 459-464. 

This article describes a new method for solving simple re¬ 
gression constants which the author has found very successful 
in teaching beginners. It Is shorter not only because it involves 
fewer arithmetical operations, but also because it is more sys¬ 
tematic. Other advantages claimed aie that the checks are 
more certain and convincing, and once the beginner has 
mastered the technique involved in simple correlation, he is 
Immediately able to solve multiple correlation constants in 
precisely the same manner with little further training. V. 
Brown. 


Winetrout, Kenneth. “The National Teacher Examinations, 
1941.” Journal of Higher Education, XII (1941), 479- 
484. 

The writer gives a brief, general description of the length 
of the examinations and the method of giving them. This is 
followed by criticisms both adverse and appreciative of the 
construction of the examinations and potential use of test re¬ 
sults obtained. It is suggested that if the question foim were 

109 



EDUCATIONAL AND PSVCltUt.OOICAL MEASUREMENT 


varied lii the long testing periods (the exaniiniiticms, total 
duration being twelve hours, arc restricted to the use of the 
multiple-choice question form), a more adequate examination 
program might be obtained. The author believes that one 
section was concerned with measurement of attitudes rather 
than capacities and would substitute a three-point (con¬ 
servative, liberal, radical) rating scale for the right-wrong 
classification used at present. Wording of que.stions, sectional 
influence, and factual emphasis are also discussed by the writer, 
who recently took the examinations, I). A. Petrrson. 


Young, Gale, “A Note on Multidimensional Psychophysical 

Analysis." PsychoiiietrikaiW (1941), 331-3.1. 

On viewing Thurstone’s psychophysical scale from the 
point of view of the mathematical theory of one-parameter 
continuous groups, it appears that a variety of different 
psychological or statistical assumptions can all be made to lead 
to a scale possessing similar properties, though requiring dif¬ 
ferent computational techniques for their determination. The 
natural extension to rmilti-dimcnsiuniil .scaling is indicated. 
{CoMTt&sy Psychometrika ) 


no 



EDUCATIONAL AND PSYCHOLOGICAL 

MEASUREMENT 


Volume II APRIL, 1942 Number 2 

A Test for Primary Business Interests Baseu on a Func¬ 
tional Occupational Classification .. .113 

Alfred J. Cardall 

Measurement in Rural Housing 

A Preliminary Report. 139 

Charles I. Mosier 

Procedure for Handling Tests and Examinations. 153 

John V. McQuUty 

Machines in Civil Service Testing. 167 

Sidney W. Koran 

Predictive Value of Certain “Law Aptitude" Tests.201 

E L. Welkei and T. W. Hairell 

An Exploratory Study of Social Guidance at the College 

Level.209 

Margaret Glockler Aldrich 

New Tests.217 

Measurement Abstracts.220 

Measurement News .227 











Copyrifjlit, I9-J2, l>y 

SCIENCE RESE \KCII .\SSO( lATES 


miKTED IN THE UNITED STATES OE AMERICA 



A TEST FOR PRIMARY BUSINESS INTERESTS 
BASED ON A FUNCTIONAL OCCUPATIONAL 
CLASSIFICATION 


ALFRED J CARDALL 
Boston University 

A S VOCATIONAL GUIDANCE leaves the area of 
pleasant gestures and advances towards a realistic proc¬ 
ess, it becomes more and more dependent upon better methods 
of evaluating an individual’s interests and potentialities. In 
spite of many imperfections, psychological tests still constitute 
our best means of diagnosis. It is questionable, however, if 
even the best analysis of an individual’s aptitudes, special 
abilities, and personality traits helps materially in indicating 
an occupational area in which the individual will find stimula¬ 
tion, economic independence, and satisfaction in his work. The 
results of such tests in the hands of a careful counselor serve 
chiefly in the determination of the “vocational risk’’ involved 
in the pursuit of those occupational activities contemplated by 
the individual. 

Perhaps the first step in vocational adjustment should be 
the crystallization of those interests in the Individual as they 
relate to activities Integral with a given job. Investigators in 
the field of measurement have long been concerned with vo¬ 
cational interests, but no previous attempt has been made to 
focus specific job-activity preferences on an occupational pat¬ 
tern. It is these data, after all, which point to the initial job, 
determine the' individual’s interest or boredom in his first 
activities, and determine to some extent his progress in it. 

Inteiest measurement has been confined largely to the 
juatter of general interests, and although such interests may 
be suggestive of occupational areas to be considered, they can¬ 
not be regarded as motivators of initial occupational activity. 

113 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


There is great need today for more specific measurement which 
will indicate initial jobs compatible with an individual’s interest 
in specific activities. 

The construction of the Primary Business Interests test is 
direct, functional, and highly specific in an area of available 
business positions for beginners. The individual is asked to 
express his preference or dislike for those specific job-activi¬ 
ties which are characteristic of such beginning jobs. Such 
activities have not been empirically selected, but they were 
based on an extensive analysis of beginning positions, and only 
those specific activities which determine and differentiate defi¬ 
nite occupational patterns were retained. 

Unfortunately no such functional occupational classifica¬ 
tions are available — a fact which may explain why so obvious 
and direct an approach to interest measurement has not been 
used before. Actually no material changes in the matter of 
item selection have occurred in 20 years, and the first inven¬ 
tory appearing in 1921 under the name of the Carnegie Inter¬ 
est Inventory has set the pattern of which practically all gen¬ 
eral inventories now available are merely revisions. Scoring 
methods, it is true, have become highly refined, but it is doubt¬ 
ful if statistical refinement in scoring is any substitute for 
item-validity or basic data. 

Obviously the most desirable way of selecting items for 
any instrument would be to set the Initial research so that 
the items would evolve as a matter of research rather than 
empirical choice. 

With this in mind, an extensive study was made of initially 
available business jobs. This study was based on a classroom 
assignment given to first-year evening college students who had 
several months’ experience in business. Each student kept a 
work diary of a typical week and was given a grade based on 
the specificity of detail contained. 106 different jobs covering 
a considerable range of activity were analysed, and over 2000 
specific items were listed, Items which occurred less than five 
times in the data were eliminated. Remaining items were re¬ 
duced to terse expressions of the actual activity. Reduction 

114 



PRIMARY BUSINESS INTERESTS TEST 


In number of items was first achieved by grouping on a basis 
of a standard terminology wherever possible. Further reduc¬ 
tion was based on the concomitance, similarity, and simul¬ 
taneity of occurrences, The question of whether two items 
Implied the same activity or invariably occurred together was 
decided by the majority opinion of five vocational counselors. 
Judgments were expressed independently on forms provided 
for that purpose. All counselors were actively engaged in 
placement work. 

The list of job-items which appeared in the job-analysis 
form was the result of the early work in this study, The data 
used in its construction now have no further significance. The 
job analysis was used a year later with a different group of 
evening students who were similarly employed and similarly 
motivated. As the job analysis was In the form of a check-list, 
the data appeared in easily tabulated form. A page was at¬ 
tached to cover the listing of any item which may have occurred 
on the job, for which a printed Item was not provided. How¬ 
ever, analysis of these additions did not reveal frequencies 
high enough to warrant their inclusion In subsequent statistical 
work. 

A word of explanation as to the check columns on this form 
may be in order, although the instructions clearly cover them. 
These instructions were the same as those used for the final 
form^ and are given on page 132. The first set of columns 
provided a quantitative analysis of items occurring on the job. 
The instructions given for the second set of columns, however, 
were designed primarily to motivate the individual. Actually 
these columns had a definite research purpose to fulfill. The in¬ 
formation they provided as to how often the items occur in the 
job ahead was an Important factor in determining their ulti¬ 
mate importance. Nearly 300 job analyses were received, 
but only 245 of these covered positions initially available in 
the field of business. But before proceeding with the statistical 
analysis of these data, let us first review what has been done 
within the field of occupational classifications. 

We have indicated the soundness of an interest test based 

115 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

on an individual's preference for specific job-activities, In 
order to score such a test, however, wc are concerned with 
the way and manner in which these activities may be grouped 
into occupational patterns. 

Occiipalional Classifica i in ns 

The various methods of occupational classifications are 
largely empirical in nature. The majority are based on the 
census classification. This grouping is primarily concerned with 
economic factors. The importance of such a classification can¬ 
not be oveilooked since a large part of the statistics available 
are based upon it. 

From the point of view of similarity of activities, the 
census classification is of little use. To associate musicians 
and osteopaths, or showmen and college presidents under the 
same heading gives no clue as to functions involved. Under 
trade, to follow advertising agencies by stock-yards is similarly 
of little help in understanding the nature of the activities. A 
further weakness is illustrated by listing together inventors 
and draftsmen under professional service; no concern is evi¬ 
denced for occupational levels. 

With these difficulties in mind, an improvement on this 
classification was reported by Edwards^ at a recent meeting of 
the American Statistical Association, 

A more elaborate outline was presented at the same meet¬ 
ing by Palmer.^ 

KimbalP redistributes the number of gainfully employed 
as reported in the census figures, by percentages. Reduced as 
it is to socio-economic groups, his study is serviceable in show¬ 
ing certain shifts of employment. 

Similar classifications arise to expedite the very important 

^ Alba M Edwards, "A Social-Economic Grouping of die Gainful Work- 
States,” fournal of the Amencnn Stntijtlcal JJnoclalion, 
XXVIII ( Oct. 26,1940), p, 378. 

^ Gladys L. Palmer, "The Convertibility List of Occupations and the Pvob* 
It," Journal of the American Statistical Association, XXXIV 

(1939), p. 700, 

e, /Kimball, Changes in the Occupational Pattern o] Ne<u> York 
State. (Albany, New York State Education Department, Educational Research 
Studies No. 2, 1937), p. 38, 


116 



PRIMARY BUSINESS IKTTERESTS TEST 


matter of recording and tabulating job placements. For ex¬ 
ample, the Massachusetts State Employment Service classifies 
their placements by industrial groups, very similar to the census 
classifications, as well as by occupational groups. 

The Dictionaiy of Occupational Titles divides the major 
occupational groups into seven classifications, arranged alpha¬ 
betically and identified by the first and second digits of the 
code numbers. Job classifications within these major groups 
are identified by three digit groups following the first two code 
numbers, 

Humphreys* gives ns another classification which is con¬ 
cerned with the general functional aspects of jobs as they 
apply to many industrial and commercial establishments. This 
classification groups functional activities regardless of the in¬ 
dustry or field in which they are found. 

The counselor Is more concerned with a similarity of the 
clerical functions in different fields than with the differences 
between the fields themselves. It is the actual function of the 
job which is significant to the prospective worker, This con¬ 
cern with the worker leads to other methods of classifications. 
Kelley”, Thurstone” and others would classify occupations by 
the pattern of abilities required. This concern with multiple 
abilities will have far-reaching effects in occupational choices 
on the basis of profile-matching if the vocational application 
of such profiles is ever understood. 

Kitson^ attempts to group vocations by the kind of training 
required, and it is to be regretted that this approach has not 
been followed up by more specific work based on classifications 
of pre-entry requirements quantitatively expressed 

Brewer® gives us a three-dimensional concept of occupa¬ 
tions classified by fields, functions, and occupational levels. 

A. Humphreys, How to Choose a Careei (Chicago, Science Research 
Associates, 1940), 48 pp. 

9Truman L Kelley, Essential Tiails of Mental Life (Cambridge, Har¬ 
vard University Press, 1935). 

BL. L. Thurstone, "A Multiple Factoi Study of Vocational Interests," 
Personnel Journal, No. 10 (Oct, 1931), 198-205. 

1 Harry Dexter Kitson, The Psychology of Vocational Ad'mstment. (Phila¬ 
delphia, J. B Lippincott Co, 1925). 

590 5 7 °^" ^ Brewer, Occupations (Boston, Ginn Sc Company, 193,()), 437-441; 

117 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


This limited treatment cT the various approaches serves 
to illustrate a variety of dassilicatiuiis includiiifr industries, 
socio-economic factors, intelligente, nhilities, and general in¬ 
terests, From the guidance ptiint of view we are not concerned 
with the socio-economic status, as the number of trained work¬ 
ers in each occupation tells us little of its function, Chnssifica- 
tions on the basis of abilities and intelligence give us more 
definitely an Idea of the requirements of occupations, but dis¬ 
regard underlying interest in such activities. Classifications 
based on broad interest patterns fail too In that such patterns 
express only a bioad attitude rather than immediate and speci¬ 
fic interests in the actual work tif the beginning occupation. 

A method of occupational classification of inestimable value 
would be one which brought together those jobs which call 
for the same specific activities in approximately the same 
proportion, regardless of field or title. In only a very gen¬ 
eral sense can we assume that either a mention of the field 
or occupational title gives any indication as to what is inherent 
in the job itself. In fact, wc may regard such descriptions as 
often adding confusion to an already complicated picture. The 
only sound basis of grouping these occupations lies in the 
specific nature of the woik, and accoidingly calls for the same 
general structure of interest, skills, and personality traits on 
the part of the worker. Although the psychometrist has ac¬ 
complished much from a diagnosis of an individuars personal 
qualities, he has, as yet, been unable to indicate the social or 
economic significance of these same qualities. In the last 
analysis it is the latter phase which makes the first meaningful. 

This study presents a statistical approach to such a func¬ 
tional classification but within a limited range of occupational 
activity. Its purpose is to measure the relationships between 
specific job-activities of initially available business positions and 
to discover what common factors exist so that special functions 
may be isolated. The activities which in themselves arc closely 
related and conversely alienated from the others form the 
patterns needed as a scoring scheme for our interest test. 

The same method of pattern determination is equally ap- 

. 118 



PRIMARY BUSINESS INTERESTS TEST 


plicable to other threshold positions, as well as various occupa¬ 
tional levels. The extension of this work would be infinitely 
worth while and of far-reaching consequence in the field of 
guidance. 


Setting Up a Contingency Table 

Our data consist of 245 analyses of initially available busi¬ 
ness jobs. The specific activities have been checked in each 
analysis in such a way as to indicate whether the activity 
occurs much or occasionally in the job Before a compu¬ 
tation of the interrelationships of these items can be made, a 
tabulation of the number of times that each item occurs in 
the data as well as the number of times which each item 
occurs with every other item must be recorded. That is to 
say, we must know how often Item No. 1 occurs, also how often 
it occurs with 2, 3, 4, 5, and up to 115. We must know how 
many times item No. 2 occurs with 3, 4, 5, etc., and similarly 
until we have a table of all such contingencies. 

A contingency table so constructed gives us at a glance a 
frequency of concomitance of any Item with all others. Since 
this particular table comprises 6,550 cells above the diagonal, 
space hardly permits its inclusion here. The diagonal values 
themselves represent the number of times that each item occurs 
in the 245 analyses and the largest value that any can take is, 
of course, 245. 

In constructing this table the I.B M. tabulating equipment 
was used, a Hollerith card being punched for each case, 

The card provides 80 columns which may be punched from 
zero to nine in each, and above this range of digits two other 
positions may be punched, making in all twelve positions within 
a column. In this Instance, the first three columns were used 
to carry the number of the job-analysis form. For example, 
form 133 has 1 punched in the first colunm, 3 in the second, 
and 3 in the third. In the fourth column the items between 
1 and 9 occurring as either much or occasionally on the job- 
analysis were punched; in the fifth column number 10 was 
punched as zero, 11 as 1, up to 19 as 9; in the sixth column 

119 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


items 20 to 29; and similarly for the remaining. The top two 
positions were not used since dichotomies of ten items in each 
column facilitated rapid punching and eliminated more com¬ 
plex conversions. After cards were punched, they were checked 
by another clerk. The resultant contingencies formed the basis 
for succeeding statistical work. 

IVaghltng the Job-Items 

Our contingency table, comprising some 6500 cells, gives 
us graphically the raw count of the occurrence of each item 
with every other. We cannot assume, however, that all items 
are of equal importance. Before going further with the not 
inconsiderable computational work of pattern-determination, 
let us see what items might now be eliminated as of little im¬ 
portance in resultant groupings. What factors are significant 
In making this decision? 

Obviously, the frequency of occurrence of the item, in¬ 
dicated as a diagonal value, must be considered; so, too, its 
variance is a natural weighting factor. Other d p\\or\ con¬ 
siderations also occur which may or may not be inherent in 
the data. Of a list submitted to a group of placement offtcers 
these four were considered most important: 

1. Proportionate time devoted to each Item on the job. 

2. Need In job-activity of position ahead. 

3. Amount of training required previous to employ¬ 
ment. 

4. Relative importance in the selection of the em¬ 
ployee 

The first two of these considerations could be determined 
from the data, since the job-analysis form used and previously 
described provided check columns, so that an M, 0 or R 
(much, occasionally, or rarely) response could be recorded in 
respect to each of these considerations. The third and fourth 
considerations, however, could not be objectively determined. 
Five vocational experts were, therefore, asked to rate each 
job item on a three-point scale in these respects. The following 
paragraphs are quoted from the written instructions to the 


120 



PRIMARY BUSINESS INTERESTS TEST 


judges, which were further clarified in a group meeting before 
the ratings were made. 

"Amount of training involved The phrase refers to the amount of 
training received before employment and considered necessary by the 
employer to perform the job activities involved. Sucli training naturally 
differs in amount, which may be illustrated by such items as “post book¬ 
keeping entries,” in which case the emploj'cr would expect the individ¬ 
ual to have had some training, and such an item as “make up balance 
sheet” in which case considerable training previous to employment would 
have been necessary. Please consider carefully, therefore, each of the 
115 items in respect to this consideration, using the first column for your 
check mark if you consrder a relatively large amount of training is in¬ 
volved previous to employment and using the second column if you con¬ 
sider a lesser amount involved. Make no check marks if the amount of 
training previously necessary is relatively little. 

"Relative importance in selection of employee. The ability to do 
certain of these items may be an important consideration in the selection 
of an initial employee. This is particularly true in respect to job-activ¬ 
ities of the contact type. Make no check marks if the ability to do tire 
activity is of relatively little importance in selection. U.sc tire fifth 
column where ability to do this item becomes of importance in selection 
and step up your check mark to the fourth position if you feel that it is 
of considerable importance in employee selection. To illustrate, tlie 
question of being able to “call on clients” is undoubtedly a factor in 
selection. But an item such as “sell goods on commission basis” is con¬ 
siderably more important, the ability to do which is probably tire primaiy 
consideration in the selection of .an employee where such an item prob¬ 
ably constitutes his chief activity ” 

The weighting formula for each item is composed of these 
six considerations. Since each cell is determined by the con¬ 
comitant occurrence of two items, it consequently has a weight 
equal to the product of the weights of the two diagonals which 
compose it, and no such result, of course, can exceed the 
product of the square roots of the weighted diagonals; can 
in fact equal it only when there is maximal concomitance of any 
two items in our data. The formula for the Aggregate 
Weight® of each cell value is, therefore, composed of the 
square roots of each frequency, the square root of each 
variance which in the case of a propoition is pq, and an addi¬ 
tive weight of factors just considered for each of the contingent 
items. 


*iThe vital assistance of Professor T. L. Kelley of Harv-ard in 
the statistical procedures of this study is gratefully acknowledged. 


developing 


121 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(5) Agg. Weighti. = (Wi VfiPiqi) (Wj Vf^p^q.) 
in which Wi = w„ + Wj + w,, + and 

M 

w,= M + O ’ 

_ 2(pii + su ) + ( == ) 
2(f-l-pu) 

2 Lt + St 


2 Cs + Ss 


Formulae 2 through S take on values between 1 and 0 and 
are additive In their function. These same numbers will identify 
each formula with the consideration afore-described. The 
symbols are thus interpreted. In (1) M is the number of times 
the item occurs much on the job, and 0 occasionally. 

In (2) pn refers to the pick-ups, or frecjuencies of occur¬ 
rence in job ahead but not In present job, while su indicates 
step-ups in the amount of the activity in the next job over the 
present, and = refers to the tabulations of equal amount of 
activity in job ahead as in present job. 

In (3) Lt indicates the number of judgments that a large 
amount of pre-entry training is required, St a somewhat 
smaller amount; j the number of judges. 

In (4) Cj indicates the judgments that ability to perform 
the job activity is a considerable factor in selection of em¬ 
ployee, and Ss somewhat of a factor in selection. 

These aggregate weights range as low as .44 on item 105 
to as high as 8.07. Items 12,15,16, 74, 78, 79, 105, 106, 108, 
and 112 were eliminated because of low weights as definitely of 
little importance in further consideration. These ten items had 
weights below unity and/or frequency of 12. 

1 Computing Correlation Coefficients 

In order to determine the relationships that existed be¬ 
tween these contingent frequencies, the cell values given In 

122 


( 1 ) 

( 2 ) 

(3) 

(4) 

(5) 



PRIMARY BUSINESS INTERESTS TEST 


the contingency table were converted into the more usual cor¬ 
relation mold. It may be observed that the raw tabulations in 
the contingency table can be expressed as /;'s, proportions, 
computed by dividing the observed values by 245, which is the 
total number of times It could have occurred in our sample. 
This table of p's can next be converted into co-variances by 
subtracting from the cell p the product of the diagonal 'p's 
occurring in its row and column. This pvi minus pi[y. becomes 
the numerator of the now easily recognizable formula for the 
product-moment correlation, the denominator being the stand¬ 
ard deviation of each variable, which in this special case is 
ypqoi each. These values are readily obtained from the new 
Kelley tables^ for any given p. For the contingency method, 
then, of computing correlation coefficient we have 


in which 


1*12 - 


Pia —P1P2 


P12 — 


Vpiqi Vpaq-j 

fl2 


N 

fx 


p— --rr 

q,= 1 _p, 

N =245 


(cell) 

(diagonal) 


( 6 ) 


The resulting intercorrelation table gives all positive cor¬ 
relation values of .3 or better, and negative correlations of .2 
or greater. The dots occurring in other cells indicate smaller 
values and fall between—.20 and -|- 30. All values below this 
diagonal, as In the contingency table, are omitted in the Interest 
of economy, merely duplicating as they do the values above 
the diagonal. 

It will be observed that in only one instance docs any value 
equal .80. This fact would seem to indicate that the work of 
combining job-items on the basis of concomitance and simul¬ 
taneity was adequately done. It may be said that a correlation 


10 

1938 ). 


The Kelley Statistical Tables, (New York, The Macmillan Comiiany, 


123 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


closely approximatiag unity would be of little value in pattern 
determination, since it would indicate invariably that one job- 
activity depended entirely on the other or always occurred 
with it. 


Groiiphnj Correlation Carjjleieiits 

Our task now becomes one of discovering what functional 
patterns of occupational activities exist in our data. The prob¬ 
lem of a cluster analysis may be visualized as a plateau of 
relationships which, if rearranged in rows and columns, would 
result in a topographical map; high relationships forming a 
peak surrounded by related and interrelated activities. In a 
sand-pile graph, low relationships form the valleys and do 
not help in distinguishing one cluster from another In short, 
these correlation coefficients must be so inter-changed within 
the matrix that the highest value in each row and column 
appears as near the diagonal as possible. If such patterns as 
we have postulated are inherent in the data, this rc-airange- 
ment should result in clusters along the diagonal of the new 
matrix. 

A table representing this ie-arrangement was constructed 
showing that several clear-cut clusters or patterns are now 
evidenced. Values of .40 and above occurring in a row and 
column appear to indicate the extent to which an item “be¬ 
longs” to a pattern, and very few such values occur very far 
removed from the diagonal. A closer grouping of these items, 
however, is possible by the further elimination of such items 
where no value of .40 or better occurs in its row and column, 
and where two items, such as 107 and 109, while closely 
related, appear to be nearly discrete in themselves. Twenty- 
nine items may now be eliminated as having no value as great 
as .40, leaving only 75 items which fall into specific groupings, 
In a very few cases items have values too high to be discarded, 
but appear to belong equally well to two patterns. In delineat¬ 
ing these patterns, therefore, these items will be given one- 
half weight in respect to each pattern. 

124 



PRIMARY BUSINESS INTERESTS TEST 

This table is too extensive to reproduce here but a single 
pattern will give a graphic illustration. By converting figures 
into toned equivalents, the sixe of a correlation coefficient be¬ 
comes a density value which illustrates the degree of relation¬ 
ship within patterns. 



Following this chart are tables in which each cluster has 
been treated as a single matrix, so that the relationship be¬ 
tween the items within it may be more carefully studied. The 
name of each pattern has been determined by the most ap¬ 
parent function within it. 


125 













EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLE 1 


ACCOUNTING 

45 41 42 43 44 48 M 58 59 57 55 Si 6D 64 6 1 52 2 9 5 1 72 m 

42 

40 

45 5 3 44 46 44 42 

45 53 46 48 46 40 

46 47 41 

48 47 44 48 40 

56 63 65 44 46 48 

61 59 52 49 44 

79 48 63 59 
47 58 57 

46 
66 45 

46 45 41 44 SO 

50 50 48 
43 


50 Veiify bookkeeping lecorda, audit, etc. 

49 Set up new system of accounts 

45 Make up tax retuins 

46 Keep inventory records 

41 Make bookkeeping entiies 

42 Post bookkeeping entries 

43 Take off trial balances 

44 Make up balance sheet profit and loss statement 
48 Make out and figure payrolls 

47 Make out monthly statements, bills, figure extensions 

58 Reconcile check book with bank statement 

59 Enter checks received m check book 
57 Take care of petty cash 

55 Draw checks 

56 Make out deposits 

60 Figure trade discounts, commissions, etc. 

64 Figure salesmen’s commissions 

61 Figure interest 

52 Check invoices, prices, discounts and allowances, etc, 

29 Determine credit risks 

special reports, sales analyses, etc. 

72 Type financial statements 

111 Take deposits to banks, cash checks, collect bank statements and have checks certified 


46 

42 

45 42 46 42 
47 40 
41 

41 


42 40 

50 55 51 43 
80 64 56 
71 65 
69 


126 



PRIMAHY BUSINESS INTERESTS TEST 


TABLES 2 AND 3 

COLLECTIONS AND ADJUSTMENTS 


26_25_27_28_ M 

SO 1+ 

43 26 

36 25 

56 27 

40 28 

68 


14 Call on clients 

26 Make personal calls ioi credit information 
2S Make personal collection calls 

27 Make customer adjustments, smooth out difticulties 

28 Handle complaints, investigate 
68 Telephone delinquent customers 


JUNIOR CLERICAL 


69 111 

98 

99 

103 

102 

52 







41 





51 

42 




54 






65 

45 


58 


101 

104 

84 

100 


46 




86 





69 





111 


40 



98 


44 



99 

45 

51 


45 

103 

59 



42 

102 





101 



46 

40 

104 





84 





100 


86 Mail out statements and coirespondence 
69 Address envelopes, bills, etc. 

Ill Take deposits to banks, cash checks, etc. 

98 Pay bills and bring back receipts 

99 Get mail and stamps at post ofHce 

103 Take mail, registered mail, and parcels to po.st office 
102 Seal, weigh, and stamp mail 

101 Fold letters, circulars for mailing 

104 Run errands for employer 

84 Assist others, general handyman 
100 Take messages, papers, and distribute mail 


127 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TABLES 4 AND 5 

STENOGRAPHIC ~ FILING 


86 69 71 67 65 6fi 96 87 72 

S2 86 

51 40 69 

60 45 49 71 

67 

43 65 

66 

96 

46 87 


72 

86 Mail out statemeiita and correspondence 
69 Address envelopes, bills, etc. 

71 Type letters, orders, foims, etc. 

67 Answer telephone 

65 Take dictation and transcribe 

66 Code and type telegrams 

96 File oidera, letters, bills, icpoits, trade infoimatlon 

87 Look up information in files, libiary, etc. 

72 Type financial statements 


SALES —OFFICE 


22 24 62 90 19 91 13 93 97 94 92 

40 24 

42 62 

37 90 

40 19 

S6 91 


84 77 S3 47 13 

93 
97 

94 
92 

22 Make out price sheets 

24 Check on competitois’ prices, compare quotations 
62 Figure quotations 

90 Dictate letters, reports, etc 

19 Attend conference with supervisor 

91 Organize work, train and supervise others 
13 Make up sales contracts 

93 Classify orders to size, patterns, salesmen, etc. 

information, credit infoiraation, up to date 

94 Make forms and charts 

92 Purchase merchandise, supplies, equipment 


128 



PRIMARY BUSINESS INTERESTS TEST 


TABLE 6 

SALES — STORE 

7 33 37 4 114 5 40 11 31 1 2 20 9 8 6 113 


38 

10 

46 39 

51 34 

40 7 

45 33 

61 50 47 41 54 37 

45 51 42 53 4 

40 114 

40 5 

42 40 

44 11 

41 31 

46 44 43 1 

41 2 

55 44 20 

65 47 50 9 

49 8 


42 6 

111 

38 Make up and schedule ihipmcnts, etc. 

10 Delivei orders to customers 

39 Put up mail and telephone ordeis, fill orders to be shipped 
34 Check on quality of goods, examine for defects 

7 Take inventoiios 

33 Check and receive incoming supplies, record, etc. 

37 Unpack goods, put away and keep storeroom in order 

4 Wrap up bundles and packages 

114 Sweep floors, empty waste basket's, clean up, etc, 

5 Put tags and labels on meichandlse 

40 Dust shelves, pul in order 

11 Restock shelves and cases 

31 Give information, quote rates over telephone 

1 Wait on customers, sell over the counter 

2 Sell goods over telephone 

20 Letter signs for stock display 
9 Set up displays, window trim, etc. 

8 Dismantle window displays 

6 Arrange display of food stuffs 

113 Clean refrigerator, show cases, equipment 


129 



EDUCATIONAL AND PSYCHOLOCR'AL MEASUREMENT 


Final Occupalional Patterns 
The foregoing tables comprise the resultant six occupa¬ 
tional patterns, namely, accounting^loUcctions and adjustments, 
junior clcncal, sahs-offuc, sales-storc, and stenngraphic-fdiug. 
These correlation coefficients, however, indicate only the "be¬ 
longingness” of items within their pattern The relative sig¬ 
nificance of each item within the pattern is determined by 
letLirning again to the table of weights. We now relist these 
items under their proper pattern headings with their respective 
weights as determined earlier in this study, 


TABLE 7 


ACCOUNTING 


3 

5, 

12 . 

16 , 

21 , 

23 , 

26 

27 . 

29 , 

31 . 

33 . 

35 . 

37 , 

52 , 

55 

57 , 

63 . 

68 , 

69 , 

70, 
74 

38 

6 . 


Verify bookkeeping recoict'i, amlit, etc. 

Determine credit risks. 

Make up tax returns . .. .... 

Make out monthly statenients, bills, figiiic extensions 
Make up balance sheet, profit and loss statement 
Make out and figure payrolls. , 

Set up new system of accounts. 

Eiguie Inicreat . .. 

Make out deposit's. 

Check invoices, prices, discounts, allowances, cie. 

Post bookkeeping entries . 

Prepate special lepoit.s, sale.s analyse,'), etc. .. 

Enter checks icoeived in check book. 

Reconcile check book with bank statements . 

Keep inventoiy records . . ,. . 

Take off tiial balances. ... 

Make bookkeeping entries. 

Figure trade discounts, commissions, etc. 

Diaw checks. 

Take care of petty cash. 

Figuie salesmen’s commissions . .. 

Scoie on" .. 

Type financial statements . 

Take deposits to banks, cash checks, collect bank statements 
checks certified... 


Weight 
... 3,11 
.,. 2.97 
,. 2.59 
. . +.21 
, 3.08 
,. 2.60 
.. 1,25 
. 2 23 
., 2 42 
, 4.05 
,,, 3 83 
. 4.28 

. 1.31 
... 2.37 
. 4.46 
. 4.86 
4,63 
. 3.14 
... 2,61 
.. 3,57 
... 1,82 


COLLECTIONS AND ADJUSTMENTS 

7. Handle complaints, investigate. 

11. Make customer adjustments, smooth out difficulties 

14. Call on clients . 

17 Telephone delinquent customers 

20, Make personal calls for credit infoimation, 

64. Make personal collection calls, ... 

JUNIOR CLERICAL 
9. ■ 

36. 

40, 

41, 

44 

49. 

50. 


Pay bills and bring back receipts. 

Get^mail and stamps at post office. 

Assist others, geneial handyman. 

VnW and parcels to post office 

hold letters, circulars for mailing.. 

Seal, weigh, and stamp marl.' h 

Run errands for employer. 


.2,14 

and have 

.4,30 

Weight 
. ... 7 38 

. , , 8.04 

, . 3 33 

. , 3,14 

, , . 1.83 

. 2,99 

Weight 
, 3 07 
. 2,74 
. 2,94 
. 2.46 
1.74 
. 2 68 
2,51 


130 

















PRIMARY BUSINESS INTERESTS TEST 


61. Take messages, papers, and distribute mail to departments . 1,91 

Score on' 

42. Mail out statements and correspondence. . ., 4 01 

60 Address envelopes, bills, etc.■ ... .... 2.07 

66. Take deposits to banks, cash checks, collect bank statement's, and have 

checks certified ... . . . ■. . • 4.30 

SALES—OFFICE Weight 

4 Check on competitors’ prices, compare quotations. 2 66 

15, Dictate letteis, reports, etc.... _.2 25 

22. Attend conference with supervlsoi. 3.03 

39. Make forms and charts. .. ... 3 65 

45 Purchase merchandise, supplies, equipment .5,91 

48, Make out price sheets.... .2 69 

54 Organize work, train, and supervise others, . . 5.31 

58 Make up sales contracts. . . 2.40 

71. Keep trade information, ciedit information, up to date.2 91 

73. Classify orders to size, patterns, salesmen, etc. .. ,. , , 1 45 

75. Figure quotations . 2.06 

SALES—STORE Weight 


1 . Wrap up bundles and packages.. 5.50 

2. Sell goods over telephone . . 6 10 

6 Unpack goods, put away, and keep storeroom in ordci.4.58 

8 . Give information, quote rates over telephone. 7 77 

10, Put tags and labels on merchandise. .3.09 

13. Deliver orders to customers. .. . 2.50 

18 Dust shelves, put in order . .. .. . ... 2,60 

24. Restock shelves and cases. .4.11 

25 Set up displays, window turn, etc. 4.28 

32. Make up and schedule shipments, etc. 3.98 

34 Check and receive incoming supplies, recoid, etc.8,07 

43 Arrange display of food stuffs.. . .2 61 

46 Put up mail and telephone orders, fill orders to be shipped.417 

47. Letter signs for stock display. 1 38 

S3. Check on quality of goods, examine for defects. , 7,53 

59 Sweep floors, empty -waste baskets, clean up, etc. 2.04 

62 Dismantle window displays, . .. 1,49 

67. Take inventories. ...6.26 

72 Wait on customers, sell over the cnuntei.... 7,95 

STENOGRAPHIC-FILING Weight 

19 Answer telephone , ... .. ..5,08 

28. Take dictation and tianscribe.. 

30 Code and type telegrams...1 51 

51. Type letters, orders, forms, etc. 3,95 

56. File orders, letters, bills, reports, trade information.4 67 

65. Look up information files, library, etc. , , , 7,46 

Score % on: 

38. Type financial statements . 21+ 

42. Mail out statements and correspondence... 4,01 

60. Address envelopes, bills, etc.. .2 07 


It may be observed that these weights run from 1.25 to 
8 07 -with the greater part of them falling between values of 
2. and 5 Since only one value exceeded 7.99, unit weights of 
from 1 to 7 were used (determined by value without regard 
to decimal) in scoring each item in respect to the pattern in 
which it belonged. 


131 





















EDUCATIONAL AND PSyCtlOLOGICAL MEASUREMENT 


Final Interest Test 

We have now arrived at a list of 75 specific job activities 
which are common to initially available business jobs and which 
fall within specific patterns; we have determined, too, their 
relative significance within these patterns. An individual’s re¬ 
action, therefore, as to the extent to which he feels that he 
would like or dislike these activities, can now be evaluated in 
terms of specific beginning business positions, Before setting 
up these items in terms of responses, however, we must con¬ 
sider how they shall be listed. To merely list them in the order 
in which they appear under their pattern headings would 
obviously condition responses which seemed to go together. 
A random order seems desirable. Accordingly the listing by 
patterns is now numbered from 1 to 75, and these numbers 
converted into a random order by Fisher and Yates’ Table 
of Random Numbers.^^ 

On the final form^^ the questions appear directly on an 
I.B.M. answer sheet. Instructions are also printed on this 
sheet and space provided for name and pertinent information 
concerning the individual taking the test. The instructions are 
reproduced here hut the items have appeared earlier, though 
in different order. 


Instructions 

This questionnaire is designed to indicate how you feel 
about those specific job-activities which characterize initial 
business positions. You are to indicate your answer by black¬ 
ening the space between the proper pair of dotted lines. 

The first three columns are headed L I D so that you 
may record your response as like, indiferenl, or dislike. If 
you think that you would like to perform the job-activity 
indicated as a part of your first business job, record your 

j A'l^ Yates, Statistical Tables for Biotoffical, Agricultural 
andWedteal Research, (London, Oliver and Boyd, 1938), 

Published by Science Research. Associates, 

132 



PRIMARY BUSINESS INTERESTS TEST 


response under L, if you feel uncertain or indifferent, under I, 
if you feel that you would dislike this activity, under D. Omit 
no items. 

When you have completed this, go over the items again 
carefully and indicate In the X column the five which you would 
most like to do There is no time limit, but you should work 
fairly rapidly as it is your first Impression which is important. 

Scoring 

The LID responses are, of course, familiar to all. Here 
the “L” response is scored with the weight previously de¬ 
termined within its respective pattern. The “1" response is 
scored as one-half that amount with all fractions dropped, 
with the “D” response scored as zero. The “X” response is 
new to previous practice in this type of test, and allows an 
opportunity for an individual to distinguish a little more care¬ 
fully between those job-activities which he feels he would like 
as part of his Initial job. It serves, too, as an additional aid 
to the counselor, apart from the resultant pattern scores. The 
fact that the individual selects five out of 75 items as mosl 
to be desired should result in additional weights in respect 
to such items. We may logically expect these selections to be 
made in respect to items In which an "L” response has been 
recorded, and an additional weight equal to one-half the 
weight of the “L” I'esponse should result in some refinement 
in scoring without being regarded as excessive, This '‘X” 
response can be scored without additional runs through the 
machine. 

The test may also be scored by hand. Special hand-scoring 
folders are provided which hold the sheet in position for the 
proper registration. The norms appear directly on this folder 
which is so cut that the score has to be recorded in the right 
place each time For machine-scoring two keys are provided 
for each pattern. By picking up a number of “contrasts" each 
time only two urns are necessary to pick up positive weights 
of from one to seven; thus machine and hand-scored norms 


133 



educational and I'SycnnLOGICAL measurement 


are identical. Since it is the same sheet in each instance, test 
conditions are also Identical 


Dircctiaits for Administering 

Some students will complete this test in twelve minutes; 
the majority, however, finish in approximately fifteen minutes, 
and none takes over twenty minutes. The subjects’ first run 
through in respect to the L I D responses is done rapidly and 
without hesitation, and this observation leads us to believe 
that there is very little doubt in their minds as to how they feel 
about these particular job-activities. Some hesitance, however, 
is seen in selecting the five job-activities most to be preferred, 
which is equally desirable, as it indicates a tendency to weigh 
them carefully. 


Intercorrelatians, Reliability and Vahdity 


Accounting . 
Coll, & Adj, 
Jr. Clerical 
Sales—Office 
Sales—Store 
Sten.-Filing 


TABLE 8 

Coll.Jc Jr, 
Acetg. Adj. Cler, 

.92 —.25 .08 

.73 —,13 

.78 


Sales 

Sales 

Sun. 

Office 

Sirs. 

Filing 

,22 

10 

.22 

.27 

,00 

-.05 

.01 

.65 

,41 

.78 

.26 

,07 


.77 

,31 



.80 


In Table 8 we present the intercorrelations of these pat¬ 
terns with their reliability coefficients as diagonal values. With 
the exception of the relationship between the Junior Clerical 
and Sales-Store patterns these coefficients are low enough to 
indicate satisfactory independence of these patterns. The cor¬ 
relation of .65 between the two mentioned, however, is too 
high to be disregarded. Further, such raw coefficients under¬ 
state the actual relationships involved. To eliminate as much 
of the chance factors as possible we apply the formula^® 


—y=~—=i to correct for attenuation, The corrected value 
Vr„ Vr^ _ _ 

of .83 clearly indicates that these patterns do not act inde¬ 
pendently. A study of the relation of these two patterns 


C Spearman, American Journal of Psyc/whpy, XV (1904), p. 271. 

134 





PRIMARY BUSINESS INTERESTS TEST 


to the Other four provides still further evidence that these 
two patterns should be combined. 

Coll & Sale? Sten. 
Acctg, Adj Office Filing 

Junior Clerical . . OS —13 .01 ,41 

Sales—Store.10 .00 .26 ,31 

Combining these two patterns would also seive to step-up 
the reliability, as this coefficient is materially affected by the 
range of scores made on a test. 


TABLE 9 



Acctg. 

Coll.fk 

Adj. 

Sale? 

Office 

Sale? 

Str? 

Sten. 

Filing 

Accounting. 

.92 

— 25 

.22 

13 

22 

Coll. & Adj. .. . . 


.73 

.27 

06 

—.05 

Sales—Office. 

. • • « < 


.78 

.13 

07 

Sales—Store .... 

, , , 



.86 

.37 

Sten.-Filing . . 





80 


Table 9 represents the revised matrix on the basis of five 
patterns with the diagonal values representing the reliability 
coefficients as in the preceding table. The highest relationship 
now observable in this revised matrix is that of ..'17 between 
Sales-Store and Stenographic-Filing, and even this is satisfac¬ 
torily low. Considered critically, it becomes .44 when corrected 
for attenuation; squaring it, it becomes .19, and gives us an 
idea of the variance which can be accounted for in this relation¬ 
ship. More significant in our present consideration is the 
amount of variance not accounted for, given by 1 - r,“ — in 
this case .81. 

It will be noticed that the combination of the two patterns 
which were not independent now has a reliability of .86. The 
lowest reliability coefficient occurs in the Collections and Ad¬ 
justments pattern with a .73, having only six items, and the 
highest reliability coefficient in Accounting with .92, a pattern 
comprising 23 items. 

As has been pointed out, the size of a reliability coefficient 
is partially determined by the number of items, particularly 
when computed by the method of spllt-halvesi\ and it might 
be argued that more items should have been retained in setting 


fnr “Note on the Use of Spearman’^ Prophecy Formula 

for Reliability, Journal of Educaitonal Psychology, XIV (1923), 301-305, 


135 




EDUCATIONAL AND rSVCIIOLOGlCAI. MEASUREMENT 


up these tests. The retention of such items, however, would 
have been at the expense of the validity of the test, since the 
test was constructed with this objective initially in mind. This 
approach deviates materially from the common practice in 
the construction of interest test questionnaires, which are 
usually built from items empirically determined, and then 
scrutinized to determine their validity, Items, of course, are 
usually rated before inclusion in regard to relevance and im¬ 
portance, and the ultimate validity determined by the analysis 
of scores obtained by persons successfully engaged in such 
occupations, or still more feebly, by the opinion of experts as 
to internal consistency. The validity of these items was de¬ 
termined by a cluster analysis, and only those which acted as 
pattern-determiners were used in the test. 

Immediately following are norms for these final five pat¬ 
terns in tabular form. They are given as standard scores and 
normalized scores with M = SO and o — 10, and are based on 
304 freshmen at Boston University, College of Business Ad¬ 
ministration. Additional norms are available but are not 
necessary in the clinical use of the instrument. What is im- 
portant is the "level of significance’’ which is here considered 
as the upper half of the range. 


TABLE 10 


Percentile 

Standard 

Scores 


Raw Scoiea 





Acetg. 

Coll. & 
Adj. 

Sales'— 
Office 

Sales— 
Store 

Stenog.- 

Flllng 

100 





100 

32 

99 

2.32 



36 

91 

29 

98 

2.05 

74 


34 

84 

27 

95 

1.64 

68 

30 

32 

77 

24 

90 

1.Z8 

62 

27 

30 

69 

22 

80 

.84 

56 

23 

27 

62 

19 

70 

52 

SO 

21 

25 

56 

17 

60 

.25 

46 

19 

23 

51 

15 

50 

.00 

42 

17 

22 

46 

13 

40 

— .25 

38 

14 

21 

41 

11 

30 

— .52 

34 

12 

19 

35 

9 

20 

— .84 

28 

10 

17 

29 

7 

10 

—1 28 

22 

6 

14 

22 

4 

5 

—1,64 

16 

3 

12 

14 

2 

2 

—2.05 

10 


9 

7 


1 

—2.32 

6 


7 

2 



136 





PRIMARY BUSINESS INTERESTS TEST 


Present Use of This Instrument 

This test Is now being used for several purposes. First, 
it assists the freshmen of the College of Business Adminis¬ 
tration in the selection of academic majors. Interest in the 
particular job-activities involved in the occupational areas for 
which such majors train is by far the most important single 
consideration in making such selection. True, the probable 
"risk” of pursuing such training in terms of special abilities 
and personality traits still remains to be evaluated. Many 
instiuments, however, are available for diagnostic purposes, 
and each entering freshman now regularly Lakes a battery of 
such tests upon admission. 

Second, It was used in the Evening Division of the College 
of Business Administration of Boston University where the 
immediate pursuit of an initially available job is economically 
imperative. The "X" response in particular has been found 
helpful to the counselor when the student comes to his desk, 
and the whole approach seems to better motivate the individ¬ 
ual’s Interest in the specific job-activities of beginning positions 
which he has been consideiing tentatively. 

For commercial students at the senior high school level 
who are looking for jobs the results should have the same 
significance as for the evening student just mentioned; they, 
too, need indicators for beginning positions. 

The test is also being used experimentally at the 9th grade 
level within the commercial cuiriculum to distinguish between 
bookkeeping, sales, and stenographic interests where such help 
is badly needed. In situations such as found throughout New 
York State where a course -called “Introduction to Business” 
is given to all commercial ninth-graders, no problem as to 
terminology of test items is met; but where no such orientation 
coutse is given, several terms need discussion and explanation. 
The maximum usefulness of this direct testing technique rests 
on an extension of realistic work experience provided for in 
the curriculum. 

In out-of-school situations this test likewise has several ap¬ 
plications. Social agencies, faced with a need for specific 

137 



EDUCATIONAL AND PSyCIU)LOC;iCAL MEASUREMENT 


counseling previous to job hunting, find it extremely useful as 
a direct pointer towards specific beginning jobs. In the Guid¬ 
ance Department at the Boston YMCA, for example, where 
an excellent and extensive job in counseling is being done, this 
test is basic to all batteries. All members of Darling’s ‘‘Job 
Hunters" group at Boston take this test since a large part of 
them are interested in beginning business jobs. Employment 
managers likewise in several situations arc using this test for 
beginning office workers where the present scarcity of beginners 
places more emphasis on allocation than merely selection. 

Implications Arising Fiom This Study 
The usefulness of this test is obviously limited within the 
range of an already generally determined Interest in the field 
of business. What is still needed is a general interest test of 
this functional type which would allocate student interest into 
general areas This accomplished, areas needing a more de¬ 
tailed diagnosis would be indicated. An extension of the hori¬ 
zontal testing range for initially available positions is equally 
feasible in other areas. As has been previously indicated, maxi¬ 
mum effectiveness of this positive approach to interest meas¬ 
urement depends upon the progressive development of the 
secondary school curriculum toward increased emphasis on 
realistic bits of work experience. Progressive educators predict 
that within a few years 25 per cent of the ninth-grade work 
may be of this type, increasing until 75 per cent of the twelfth- 
grade level curriculum will be composed of such work experi¬ 
ences. Such a program will go far toward bridging the present 
gap between school and job placement. 

It should be similarly noted that a vertical extension of this 
testing technique is equally possible Actual experience on a 
job results in a refinement of interests in the field and should 
be measurable as a pattern indicative of progressive specificity. 
Many quite different jobs may be indicated from the develop¬ 
ment of more specific interests within a single initial pattern, 
and assistance in the selection of secondary level jobs could 
thus be provided. The technique for constructing such interest 
tests is now available. 


138 



MEASUREMENT IN RURAL HOUSING 
A PRELIMINARY REPORH 


CHARLES I MOSIER 
Social Secuiity Boaid 


T he SLOAN PROJECT in Housing Education at the 
University of Florida is an attempt to investigate the 
broad problem of the extent to which educational materials 
introduced through the school may exert an influence on the 
social, cultural and economic life of the community as a whole. 
Specifically, the problem investigated is the effect on the hous¬ 
ing status of the community, and on the housing attitudes of 
its members, of a broad program of education in all aspects 
of housing introduced through the schools. The general out¬ 
lines of the program involved the utilization of six white 
school communities of Florida, three experimental and three 
control. The plan of the experiment involves a determination 
of the present status of all six communities in those attributes 
which might be affected by housing education, the introduction 
of housing material into the regular curriculum in the schools 
of three experimental communities, and subsequent tests to de¬ 
termine the change in housing status, attitudes, and school 
achievement which can be attributed directly to the experi¬ 
mental program. 


The present report is concerned only with a summary of 
the initial measurement aspects of the experiment.® All meas¬ 
urements have been made in both experimental and control 
communities at the beginning of the experimental program to 


iTho study reported here is being carried on nt tlie University of Florida 
under the auspices of the Sloan Foundation in Applied Economics. 

• measurement program has been prepared and 


139 



EDUCATIONAL AND rSYCIIOIOGICAL MEASUREMENT 


establish a base line from which to measure change, and they 
will be repeated at subsequent intervals throughout the course 
of the experiment as well as at its close, Certain of the meas¬ 
urements might well be made again several years after the 
termination of the experimental work of education in order to 
determine the stability of the results and to provide a measure¬ 
ment of those changes occurring after relatively long latent 
periods. As an instance of the latter, we might hypothesize 
that one of the outcomes of the experiment would be that 
those pupils who had been exposed to the experimental pro¬ 
gram of housing education would, on leaving school and estab¬ 
lishing homes of their own, secure more adequate housing 
than those who had not. The observation of the full impact 
of such effects could take place only twelve to fifteen years 
after the beginning of the experiment. Twelve years would be 
required for the pupil to progress normally from the first 
grade to high-school graduation and thus receive the fullest 
benefit of the educational program. At least three more years 
would elapse before any sizable proportion of the subjects 
would have married and become sufficiently established eco¬ 
nomically to provide themselves with homes which could be 
considered indicative of their ultimate housing status. It is 
not proposed that such an extended period of observation is 
essential to the program, but attention should be called to the 
long-range character of certain of its effects. 

The measurements which have been undertaken can be 
divided into four major areas: housing adequacy, housing 
attitudes and insight, housing information of pupils, and aca¬ 
demic achievement. For the measurement of housing ade¬ 
quacy we have developed a Housing Inventory^ describing the 
objective condition of the house as observed by a trained field- 
worker, yielding a Housing Index —a composite score for the 
house obtained by weighting and combining these observations 
to provide for each house a single score. The inventory records 

^ For a detailed report on the development of the Housing Inventory, see 
C. I. Mosier, Measurement of Rural Housing Stains, in preparation, 

140 



measurement in rural housing 


obtained by interview have been supplemented, in a large pro¬ 
portion of the cases, by photographic records of the houses 

studied. ■ • . 

It is conceivable that the program of education might pro¬ 
duce a real change in attitude toward housing, in insight into 
the present inadequacies where they exist,'* and in motivation 
toward better housing conditions, and yet there might be no 
externally observable improvement because of the pressure of 
economic circumstances. Because of this, an attempt has been 
made to measure the extent of such effects by a separate eval¬ 
uation of the answers to certain of the inventory questions. 
Plans have been made for a more direct measurement of atti¬ 
tudes and for the development of tests measuring achieve¬ 
ment in the acquisition of information in the field of housing, 
but these plans have not yet been fulfilled. 

It is important to know whether the introduction of hous¬ 
ing materials into the curriculum has been at the expense of 
training in the fundamental academic achievements, or whether 
it has resulted in more effective learning of the skills of read¬ 
ing, arithmetic and language through a use of material of more 
immediate interest and appeal than that contained in the cus¬ 
tomary curriculum. As the initial phase in the investigation 
of this problem, a program consisting of four achievement 
tests and an “intelligence” test has been administered In grades 
4-12 in all schools. The results of this initial testing and the 
subsequent retesting which is contemplated will provide valu¬ 
able information on this question and on others which may 
arise. 

Before beginning a summary of the initial results in each 
of the four areas of measurement, certain further general 
statements should be made concerning the communities inves¬ 
tigated and the nature of the sample involved. While the 
detailed description of the several communities can best be 
presented in the light of the results of the initial surveys, cer- 

^In answer L'o the Inventory question, “Whnt changes or improvements 
do you think are needed”, the occupant of one extremely dilapidated shanty 
said, “Well, if I could get hold of some cardboard boxes to tack up inside to 
cover the chinks, I reckon I’d have a right tight little place.” 

141 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tain broad generalizations can be made which will facilitate 
the understanding of these results. For the purposes of this 
study, a community is defined as consisting of all those families, 
and only those families, which send at least one child to a 
specific school being studied. This automatically restricts the 
population to white families. Such a definition necessarily in¬ 
volves a somewhat different use of the term "community" 
from that ordinarily envisaged. It excludes all those families, 
maintaining their Identity as family units, which have been 
established so recently that there is no child of school age. 

It excludes any families who have children of school age, but 
who, for one reason or another, do not send their children to 
the school in question. In one of the communities the field- 
workers reported this situation: “Are you going to see the 
Joneses down the road? They've got a flock of kids, but the 
kids don’t go to school because they ain’t got no clothes,’’ 
The extent to which this, and other comparable situations, pre¬ 
vail is, at present at least, unknown, but it does exist and in¬ 
fluences the sample studied, since the group under discussion 
will consist of the poorest families in the area. In certain of 
the communities the definition of the population imposes an¬ 
other restriction in the outlying districts. As the distance from 
school becomes great, the children of grade-school age go to 
the local school, so that only those families with at least one 
child of high-school age are included. The inclusion of a family 
in the “community’’ depends, then, not only on the age-dis¬ 
tribution of the children (and hence the length of time the 
family has been established), but on geographical location as 
well. These selective factors will, inevitably, limit the extent 
to which the results of this study can be generalized to the cora- 
munity-as-a-whole, since the population studied is limited to 
those families sending at least one child to a participating 
school, 

The unit of the-Investigation, when it is not the individual 
pupil in the school, is the dwelling group. All persons living 
within the same dwelling-unit (house or separate apartment) 
are considered to constitute a “family unit." A total of 745 

142 






HOUSES AT s]':lj-;ctj-:d levjh,s oi 

HOUSING INDJ'X VALUJ-; 















11 1 


IIOUSI'.S AT Sl-'.Ll'XTTD IJA'MLS 01- 
IlOUSIXd IXDl'lX VAIXJ- 


..i II 




X ■ ' 



yj] 

tii - 

fji*' '■ 






..J,^. L. , 





" 1 


1 ->1^ 

LaI 

i Ar^j 














MEASUREMENT IN RURAL HOUSING 


families was studied, with test records of 1028 children of 
school-grade above the fourth. There are records of 522 chil¬ 
dren in grades 1-3, but these records are incomplete and do 
not represent the total number of children in those grades. 
The population was defined as of the date of interviewing 
(October and November, 1940), and families moving into the 
community after that date were not considered. 

All of the primary data and much of the derived data have 
been recorded on electric accounting equipment cards, where 
they are readdy available for further research. 

The evaluation of the housing status of the communities 
was by means of a Housing Inventory specifically designed to 
measure housing adequacy in rural areas. In addition to iden¬ 
tification data the Inventory recorded the responses to 85 items 
of the type; 

Fireplace and chimney—state of repair? 1. no fireplace. 2. poor— 
masonry cracked, mortar ciumbling, many loose bricks. 3. fair—ma¬ 
sonry discolored, occasional loose bricks. 4. good—no obvious repairs 
needed. 

Kind of cookstove (if more than one, mark the liigliest number). 

I. open fireplace. 2. makeshift stove—sheet-tin arrangements, etc. 3. 
small wood stove. 4. large wood range. 5. electric. 6. gas, bottled 
or city. 

"How^many rooms, not counting closets, porches (or the bathroom) 
aretheref' (check answer with your own observation). 1. one. 2. two. 
3. three. 4. four. 5, five, 6. six. 7. seven. 8. eight. 9. nine. 10, ten. 

II. eleven. 12. twelve or more—specify. 

With the exception of answers to certain specified questions 
the observations recorded were those of a trained interviewer, 
not of the occupant. 

The criterion which such an instrument should fulfill was 
established, available sources searched for suggestions as to 
possible items, and a preliminary edition prepared. This pre¬ 
liminary edition was subjected to the editorial scrutiny of 
research workers in the field of rural welfare, revised, and 
subjected to a searching field-test in which two Inventories 
were completed for each house under conditions comparable 
to those of the main study. The Inventory was revised again 
m the light of the field experience, and prepared in final form. 

145 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Three interviewers naeeting a set of pre-established qualifica¬ 
tions were used, They were given intensive training in the 
use of the Inventory under actual field conditions and in estab¬ 
lishing common standards of evaluation. A booklet o{ Instruct 
Hons to Raters was prepared for their further guidance. All 
coding of the data and computations were performed sepa¬ 
rately from the interviewing, and a routine of computation 
established. An occupational code adapted to the needs of the 
particular problem has been devised, and applied to the clas¬ 
sification of the heads of the families studied. 

The data from the Inventory have been recorded on punch 
cards. A procedure has been devised to weight the responses 
to the individual items in such a way as to provide the maxi¬ 
mum accuracy of measurement.'"’ The weighted responses have 
been combined into a single composite Housing Index meas¬ 
uring the adequacy of each house as a dwelling place. The 
reliability of this Index has been estimated in several ways. 
In a selected area separate from the six communities, but typi¬ 
cal of them, 50 houses were inventoried twice by a different 
interviewer, and with a time interval ranging from one to five 
weeks in order to estimate the errors due to the interviewer 
and the time of interviewing. The consistency of the scores 
on this test-retest survey was exceptionally high (r = .96), 
and no systematic difference was found between the two inter¬ 
viewers studied. Estimates of reliability by the split-halves 
technique of determining reliability and by the method of 
rational equivalence both yielded reliability coefficients In excess 
of .97. The accuracy of measurement reflected by these values 
can be seen from the following considerations: 32 per cent of 
the houses received their true scores, and no house received a 
score in error by as much as four points out of a range of 
35 points. 

The Index was validated in part by the criterion of inter¬ 
nal consistency. The extent to which the weighting proced¬ 
ure automatically transformed certain first approximation 

® A description of the statistical technique of weighting the item-responses 
IS being prepared for early publication. 

146 



measurement in rural housing 


weights which were not in accord with a priori considerations 
into weights which agreed closely with leasonable values was 
adduced as further important evidence of validity. A high 
degree of internal consistency among the Inventory items was 
revealed—houses good in one respect tended to be good in all 
respects, and conversely. As further validation the photo¬ 
graphs of one hundred houses were scaled for adequacy by the 
psychometric method of equal-appearing intervals and the 
correlation between these scale values and the Honsintj Index 
compared. The relationship was more than satisfactorily 
(r=.81), but not high enough to justify substituting 
photographs for Inventory ratings. The meaning of individ¬ 
ual Index scores is further made graphic by the presentation 
of photographs of actual houses for selected values of Index 
score (shown in the illustration Figure 1). The typical hous¬ 
ing conditions at several score levels have been described and 
are presented in detail elsewhere as aids in the calibration of 
the Index.^ 

Some of the more significant dcsciiptive findings of the 
housing survey are presented here. The median family con¬ 
sists of two adults, two children over twelve and one child 
under twelve years of age. Sixty-two per cent of the families 
own their own homes, 18 per cent rent, and 20 per cent are 
classified as share-croppers, squatters, or rent-free tenants. 
Fifty-nine per cent of the families gave farming as their only 
occupation; 12 per cent stated that they Avere on relief, or 
gave “public work’’ as their occupation. 

Twenty-two per cent of the houses have only three rooms, 
or fewer, but 17 per cent have rooms closed off and not u.sed, 
unless for storage. Thirty per cent have less than one room 
for each adult or equivalent; 45 per cent have no separate 
living room. Thirty-eight per cent used auxiliary bedrooms 
(rooms used during the day for some other purpose) ; 27 pev 
cent sleep with two persons or more in every bed, and in at 

® C I, Mosler, Measurement of Rural Housing Status, tor. ril, 

147 



EDrCATIONAL AN*n I’SYCltOl.dtlU'AI. MRASt’REMENT 


least 14 per cent of the houses, sex privacy in sleeping arrange¬ 
ments cannot be maintained. 

Forty per cent have inside walls completely unceiled, with 
the studding showing. Kight per cent have no decoration of 
the inside walls whatever, hut only three per cent utilize 
handicraft decorations or native materials. Fifteen per cent 
have no pantry or storage space in the kitchen, nr only poor 
makeshifts; 80 per cent have no kitchen sink whatever, and 
52 per cent have no refrigeration whatever; 72 per cent have 
no electricity. Eleven per cent of the families must carry their 
water more than a hundred feet, and another 61 per cent have 
only outside hand pumps or wells. Fifteen per cent have yards 
littered with garbage and refuse; 16 per cent have no toilet 
facilities whatever, and another 70 per cent have no better 
than an open surface privy. The prevalence of hookworm, 
typhoid, and dysentery is not surprising. 

Sixty-three per cent of the houses have chimneys which 
were judged to constitute some degree of fire ha/.ard. Twenty- 
seven per cent of the houses have iingla/.ed windows (wooden 
shutters only), and another 16 per cent have more than three 
broken panes. Only 41 per cent have all outside openings 
screened and in good repair to serve as protection against 
malaria or typhoid. The roof needs some repairing in 47 per 
cent of the houses, and in 6 per cent one can sec daylight 
through it; 24 per cent show visible evidence of termite dam¬ 
age—only 2 per cent are termite-proofed—and 52 per cent 
show some degree of damage from dry-rot; 13 per cent have 
their foundations sagging, rotted, and crumbling. Sixty-three 
per cent showed no evidence of having been painted at any 
time; 37 per cent have no shrubs around the house and 28 per 
cent have no flowers; 58 per cent are lacking even bordered 
and sand-surfaced walks, while less than one per cent used 
pine-straw to surface the walks and drives. 

In "Spite of these objective conditions, 44 per cent of the 
occupants mentioned no more than one aspect of the house 
needing repair; only 12 per cent actually planned repairs, and 

148 



MEASUREMENT IN RURAL HOUSING 


the average family says that, if they won a hundred-dollar 
prize or found that sum, they would spend $57.00 on the 
house. 

The results of the survey have been tabulated for each 
community, and for the experimental and control groups, both 
in terms of the percentage response frequency for each item- 
response, and of the frequency distributions of the Housing 
Index.’’ The differences between the experimental and contiol 
groups were systematically examined. Differences in housing 
status between the individual communities are very gieat— 
the best house In the poorest community is not as good as the 
average house in the best community. Differences between the 
experimental and control groups are small, the control group 
showing a slight superiority. 

Photographic records have been obtained for 517 of the 
houses studied These photographic records were obtained 
under standardized conditions, so that they may be repeated 
at a later date. Map records showing the location of each 
house In each community have been prepared and are on file. 
The possibility of analyzing these data to show relations with 
geographic factors is considered. 

Achievement tests in reading, aiithmetlc, language, science, 
and mental maturity were given to 1028 pupils in all grades 
above the third. The results of this testing program have 
been analyzed by community and for the experimental and 
control groups There was no disceinible difference in the 
relative school achievement for the experimental and control 
groups, although there was considerable variation among the 
schools themselves All schools were, on the average, mark¬ 
edly retarded in achievement as compared with the chronolog¬ 
ical age 01 the grade placement of the students. This mean 
retardation was from one and one-half to nearly three years, 
most marked In the higher grades, of course, and greater in 
science achievement than in any of the other fields When, 

These data are presented in full in Measurement tn the Field of Rural 
Housing, loc. cit. 


149 



EDUCATIONAL AND I’SVCIIOI.OUIC’AL MEASUREMENT 


however, achievement is compared, not with chronological age 
or grade-placement, but with mental age, this apparent retard¬ 
ation disappeared, so that it can be said that the schools are 
educating the pupils to the limits of their mental capacities— 
assuming that the intelligence test does measure mental 
capacity. 

The relationship between school achievement and housing 
conditions has been investigated. In spite of the reasons to 
expect that achievement would be related to the conditions of 
the home, the results do not bear out this expectation. The 
Housing Index showed correlations which were positive, but 
very low—ranging from .12 to .31—with the various meas¬ 
ures of school achievement. The most significant relation was 
between Index and grade placement, indicating that children 
in the higher grades tend to come from superior homes. 

Certain items of the hventoiy were designed to measure, 
not the adequacy of the house, but the attitude of the family 
toward housing problems—insight into the housing condition, 
and motivation to better those conditions. When these items 
were studied by multiple factor analysis, the existence of a 
single factor of housing attitude, independent of housing ade¬ 
quacy, was convincingly demonstrated. This attitude variable 
is measured by the items dealing with willingness to spend 
money on the house, with ownership, with the number of re¬ 
pairs wanted by the occupant, with the difference between re¬ 
pairs needed and repairs wanted, and with whether or not 
repairs were planned. A method of measurement of the 
strength of this attitude in each family has been devised and 
is being applied to the individual families. 

Detailed plans for a moie direct attack on the problem of 
measuring attitudes by means of a specially designed attitude 
scale have been prepared, The development of this scale and 
its application to the families studied is a project which, it is 
hoped, will be undertaken at the earliest possible opportunity. 

One of the contributions of this study, apart from the 
development of a Housing Inventory and its standardization, 

ISO 



MEASUREMENT IN RURAL HOUSING 


has been the accumulation of data for subsequent analysis in 
connection with specific problems which will serve as the base 
line from which the effects of the experimental curriculum 
can be judged. These data—coded answers to 93 items in the 
Housing Inventory for each of 715 houses, a measure of the 
adequacy of each house, a record of its location and photo¬ 
graphs for 517 of the houses, measures of school achievement 
for each of the 1028 children in grades 4-12 of the six schools, 
and summaries of the frequency and percentage frequency of 
each of the 685 item-responses of the Housing Inventory for 
each of the six communities and for the experimental, control 
and total groups—have been recorded and filed on Interna¬ 
tional Business Machines punch cards, and detailed indices to 
this information are presented in the detailed report.” How 
valuable it is will depend on the extent to which these data are 
used to provide a knowledge of the factors affecting rural 
housing, whether those factors be educational, sociological, 
psychological, or geographical.® 

The principal results of the initial measurement program 
can be summarized as follows: 

1. A Housing Inventory has been prepared and applied 
to the 715 homes of children in six rural Florida schools. 

2 A technique for weighting these responses to the In¬ 
ventory to yield a measure of housing adequacy, the Housing 
Index, has been developed. 

3. A Housing Index, using the weights obtained, has been 
carefully standardized, using a group in addition to that on 
which the weights were developed. The reliability coefficient 
by test-retest with different Interviewers was .96 and by in¬ 
ternal consistency was .97. The Index has been validated by 
expert opinion, by internal consistency, and by comparison 
with psychophysical scaled judgments of adequacy based on 
photographs. The correlation coefficient between Index and 
scale values from judgments of photographs was .81. The 

^Loc. at. 

® These data will be made available to any interested workers engaged 
in problems toward the solution of which these data might contribute, 

151 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

meanings of the various Index scores in the complete report 
have been interpreted by desciibing the typical houses and 
presenting photographs at various scoie levels. 

4. Standardized intelligence and achievement tests in six 
fields were administered to all the children of the schools, 
The correlation of children’s school achievement with home 
conditions as measured by the Housing Index was low for all 
measures of achievement, with coefficients ranging from .12 
to .31. 

5. By utilizing the answers of occupants to selected items 
in the Housing Inventory, a factor of housing attitude has 
been isolated, and initial attempts to measure it have been 
made. 


152 



PROCEDURES FOR HANDLING TESTS 
AND EXAMINATIONS 


JOHN V McQiirrrY 
University of Florida 

T he board of university Examiners of the Uni¬ 
versity of Florida conducts a program of testing some¬ 
what different from that ordinarily performed by examining 
boards. Throughout the school year it offers a regularly 
scheduled series of progress tests in the basic courses in the 
General College. These tests are given in addition to the com¬ 
prehensive examinations which are given at the completion of 
each course. The Board integrates the International Test 
Scoring Machine and punch card tabulating equipment to en¬ 
able it to report results of tests promptly and adequately to 
all persons concerned. Also, the Board uses punch cards In 
building Its library of test items. It is the purpose of this 
paper both to discuss the general work of the Board and to 
give the operations in detail, with special emphasis on the use 
of punch cards. 

At the University of Florida all of the freshmen and 
sophomores enroll in the General College for their first two 
years’ work. The Board of University Examiners was created 
in 1935 along with the establishment of the General College. 
The Board was charged with handling the admissions to the 
University and with the examinations given in the comprehen¬ 
sive courses which were to be offered in that college. At 
present the enrollment therein is about two thousand The 
college examining activities of the Board come under two 
heads: comprehensive examinations given at the completion 
of the courses, and progress tests given at regular intervals of 
from two weeks to a month in each of these courses. The 
comprehensive examinations are six hours in length for two- 

153 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

semester courses, and three for semester courses. The results 
of these examinations form the sole basis for the assignment 
of the student’s final grade. The progress tests are similar to 
the comprehensives except that they cover smaller areas of the 
course and are usually only one hour in length. These tests 
are given to indicate to the student, his instructor, his parents, 
and the University officials how each student is doing. Even 
though some of these tests are given as early as eight months 
prior to the comprehensive examinations, the coefficients of 
correlation between results on progress tests and comprehen¬ 
sive examinations range from .65 to .83. Thus the importance 
of the progress tests as indicators of probable success on the 
comprehensive is demonstrated. 

When the progress testing program was first instituted, 
both students and faculty were somewhat skeptical of the 
value of progress tests since their results were not counted 
when the final grades were assigned. For one thing, the prac¬ 
tice of not averaging in test results at the end of the course 
was a new and radically different procedure and hence subject 
to view with considerable alarm. Now that several years have 
shown that the progress tests are just as important whether or 
not they are included in the final grade, the question of their 
value is no longer raised, but their usefulness is taken as a 
matter of course. There are two definite reasons for basing 
the grade entirely on the final comprehensive: 1. The grades 
are then assigned on the basis of how well the examinee knows 
the course as a whole, since piecemeal learning of the material 
is not enough to Insure success on the examination. 2. Under 
such a practice the progress test results become sign posts along 
the way which indicate how the student and instructor are 
working together so that the former may achieve success on 
the comprehensive, but the student who compensates for an 
inadequate preparation and a consequent poor showing on the 
early tests by ultimate mastery is not penalized for his early 
failure. 

Examinees are permitted to keep the progress test book¬ 
lets, and these constitute an excellent source of material for 


154 



TESTS AND EXAMINATIONS I’KOC’EDl'KES 


review. Also, the examinees’ answer sheets arc returned to 
them. Thus the student not only has a lecord of his raw 
score and his percentile rank, but also a record of the answers 
which he gave to the questions. By studying these answers in 
relation to the key of correct answeis which is letiirned with 
the answer sheet, the student can make a detailed study show¬ 
ing which Items were missed and which were answered cor¬ 
rectly. 

In addition to the tests and examinations already discussed, 
the University sponsors each spring a state-wide twelfth-grade 
high-school testing program, the results of which are used by 
the University as entering placement tests. The General Col¬ 
lege handles the announcements of the program and the dis¬ 
tribution and receipt of the test materials. 

Separation of Instructing and Examining 

In theory there is complete separation of the teaching and 
examining functions. The examining and the issuing of final 
grades are both done by the Board of Uxamincrs. However, 
in actual practice there is the closest cooperation between the 
instructional staffs and the Examiners, In most of the courses 
the construction of the test items is done by persons engaged 
in teaching those courses. These items arc then subjected to 
critical review by the Examiners, and any test items composed 
by the Examiners are reviewed by members of the instruc¬ 
tional staff. In all cases the test or examination has the ap¬ 
proval of both the instructional staff and the Examiners. All 
tests are printed, given, and scored by the Examiners. When 
it comes time to set the gi-ades—i.e., determine the raw-score 
division points for the passing grades A, B, C, D, and the 
failing grade E—the members of the instructional staff co- 
opeiate again The members of the staff and a representative 
of the Examiners hold a meeting scheduled for this purpose. 
In this task, use is made of all pertinent objective information 
about those students making a given raw score; the results of 
the entrance examinations and of the progress tests through¬ 
out the year as well as the characteristic responses to those 

155 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


test items considered most crucial are all considered for stu¬ 
dents at the critical values dividing letter-grade equivalents. 
Also at hand are the distribution of raw scores on the exam¬ 
ination and the corresponding percentile ranks. The anonymity 
of the individual student is preserved throughout this pro¬ 
cedure. Again it must be made clear that all these data are 
used as aids in deteimining where the grades should come on 
the distribution of raw scores. In no sense are any of the data 
“averaged-ln” when the grade is assigned. Once it is decided, 
for example, that scores of 400 or above are to receive A’s, 
everyone in that category—but no one else—receives an A, 
and so on for the other grades. There is no “grading on the 
curve” in the sense that a predetermined distribution of grades 
is followed. This is shown by the fact that even in a course 
taken by as many as 700 persons the percentages of A’s has 
varied from 5 to 11 and the percentage of failures from 12 
to 21 Since 1935 the Board of Examiners has assigned 42,214 
final grades with the following distribution: 

TABLE 1 

DISTRIBUTION OF GRADES FOR COMPREHENSIVE EXAMINATIONS 
WINTER, 1936 THROUGH SUMMER, 1941 

Per Cent for Each Grade* Total Per Cent 

A B C D E Examined Absent Total 

8 42 16.72 37 35 21.85 15.66 41,112 2.68 42,214 

*Based on number examined. 

Office Routine 

In discussing the routine for handling the test results and 
the test items, special emphasis will be given to those pro¬ 
cedures which may not be widely known. It is recognized that 
practices for the processing of examinations will vary accord¬ 
ing to the use to be made of the test materials and with the 
mechanical equipment available. The Board of Examiners 
makes use of the following mechanical equipment supplied by 
the International Business Machines Corporation: test scoring 
machine with graphic item counter, alphabetical printing punch, 

156 



TESTS AND EXAMINATIONS PROCEDURES 


high speed reproducer, interpreter, collator, and alphanumeric 
tabulator with 25 alphabetical and 30 numeric type bars The 
routine of handling the progress tests is affected by the facts 
that all answer sheets are to be returned to the students, and 
that the sheets carry the raw scores and the percentile ranks. 
In the case of the comprehensive examinations the answer 
sheets are retained by the Examiners, the examinee receiving 
nothing but his letter grade. Also, the need for prompt re- 
poiting of results is particularly great in the case of progress 
tests because it is felt that the results are more helpful to the 
students if received while interest is still high. Hence, prog¬ 
ress tests are usually given on a Saturday morning and the 
results returned to the instiuctors, administrative offices, and 
the students the following Monday, even though as many as 
1000 students are given two tests each. In the case of the 
comprehensive examinations there is an equally great need for 
prompt work because all examining for the year’s work for the 
freshmen and sophomores must be accomplished and final 
grades submitted within a period of two weeks. It would be 
impossible to do all this on the limited budget of the Exam¬ 
iners without making full use of the punch-card system. The 
integration of punch-card methods with the test scoring ma¬ 
chine, however, permits a very small staff to handle a large 
volume of tests in a short time and to make the results avail¬ 
able in a variety of forms to meet the varying needs of stu¬ 
dent, instructor, department head, and administrator. The 
use of punch cards will be discussed under the four following 
heads; (1) placement tests; (2) progress tests; (3) compre¬ 
hensive examinations; and (4) library of used test items. 

A Punch Cards and Placement Test Results ,—In the 
spring of 1941 the following tests were used in the high-school 
testing program; 

1 The Henmon-Nelson Test of Mental Ability, Form B 

2. Cooperative English Test, Effectiveness of Expression, Lower 
Level, Form Q 


157 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


3-5 Cooperative Achievement Tests, Form QR, in 

3. Social Studies 

4. Natural Sciences 

5. Mathematics 

6, Cooperative French Test 

7, Cooperative Latin Test 

8, Coopeiative Spanish Test 

All of these tests are given with separate answer sheets 
which are machine-scored by the Board of Examiners. All of 
the test results are punched Into tabulating cards, and the fol¬ 
lowing steps are employed in the procedure: 

la. Answer sheets are handled by schools, and when the sheets are 
received, they aie separated by tests and alphabetized. Some 
visual check: such as a corner-cut or a punched hole will aid in 
the checking of the homogeneity of a pile of answer sheets. 

2a. Name cards are punched for each examinee. Each card carries 
a code number indicating the high school. Sex is indicated by 
using F for female and M for male. A heading caid with a 
characteiistic control punch is made for each high school 

3a. The name cards are listed on the tabulator in alphabetic order 
by high schools on a prepared foi m which can ies eight columns, 
one for the raw scores for each test. 

4a. The answer sheets which have already been separated accoid- 
ing to tests and alphabetized aie checked against the list de¬ 
scribed in 3a, to make certain that the answei sheets aie in the 
same order as the names on the list. In the case of peisons who 
took one but not all of the tests, the areas where results for the 
missing tests would be recorded aie marked. The practice of 
having the answer sheets and names in identical order, with 
absentees designated, facilitates the recording of the machine 
scores. 

5a. The answer sheets are scored on the test-scoring machine and 
the raw scores recorded in the appropriate rectangle on the 
name lists Usually two peisons are used, a machine operator 
and a recorder. The operator gives the scores orally to the 
recorder, who enters them in the proper place. 

6a. The answer sheets aie scored again and the raw scoies checked 
against those recorded in number 5a 

7a. The checked raw scores are punched into the name cards. The 
name cards and the lists carrying the raw scores aie in the 
same order. 


158 



TESTS AND EXAMINATIONS PROCEDURES 


8a The name cards aie listed on the alphanumeric tabulatoi to 
show name, sex, and law scoies A compaiin]? contiol is main¬ 
tained on high-school code number. 

9a. The lists in number 8a are checked orally against the lists on 
which the raw scores are written. 

Note: Fiom now on all operations aie entirely mechanical. 

10a As soon as all of the scoiing and puncliing has been done, the 
distiibutions of raw scores aie made by soiling the caids in 
order by raw scores and lunning them through the tabulator 
with a comparing contiol on raw semes if an inteival of one 
is desired (if any other interval is desiied, it is necessary to use 
interval heading cards to establish the contiol bieaks); and 
progressive totaling is used to secure the cumulative frequencies. 

11a Percentile ranks are computed foi tlie distributions obtained in 
number 10a. The peicentile lanks are punclied into heading 
caids which cairy law scoics and conesponding peicentile 
ranks. The corner-cut on the heading cards should differ 
fiom that on the score caids. 

12a. These peicentile lank caids, which must cany an appiopriate 
contiol, can then be collated into tlie raw-scoie detail cauls. 
(Steps 10a, 11a, 12a, and 1.3a should all be done foi one test 
before anything is done for another test, if tlie sorting is to 
be kept to a minimum.) 

13a. By using the high speed reproducer tlie proper percentile lanks 
are punched fiom the heading cards into the detail cards, using 
a control punch to clear the punch magnets at the proper time. 

14a. After all of the gang punching has been done (all gang 
punching should be sight checked and the detail cards checked 
on the collator for proper sequence order), all percentile ranks 
can be interpieted at one time. 

15a. The detail cards, which now carry percentile lanks, are soiled 
alphabetically on three letteis, then soiled by high schools 
using the high-school code. Then the detail cards are checked 
by hand to insme correct alphabetical order. 

16a. Lists are run for each high school, showing all percentile ranks 
for each examinee Under some conditions it may be desirable 
to run master alphabetic lists before the detail cards are sorted 
by high schools. 

17a. A master list of the results for all examinees in alphabetic 
Older is prepaied on tire alphanumeric tabulator. Usually this 
list is prepared on a stencil or duplicator paper, so that a 
large number of copies can be made. 

159 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


B. Punch Cards and Progress Test Results .—^The nature 
and use of progress tests has already been discussed. Punch 
cards are used here as an aid in making the results available 
quickly to the instructors and administrative officers. The steps 
in preparing the cards and using them are as follo-ws: 

lb. As soon as registration is complete, the class cards in the 
Registrars Office aie duplicated for each course in which 
pi ogress tests aie to be given. The information picked up 
from the Registiar’s card is: 

(1) Student number 

(2) Student name 

(3) Course and section 

2b. The caids piepared in numbei lb are alphabetized for three 
letters on the sorter and the alphabetizing checked by hand. 

3b. The collatoi is now used to insert a sequence number card in 
front of each set of cards foi the same student. These sequence 
cards are prepared in advance and carry a control punch as 
well as numbeis in sequential ordei fiom 0001 to 3999 Only 
odd numbers are used, the evens being leseivcd for future 
expansion due to errors or to late registration changes. 

4b. The progress test cards with the inserted sequence cards are 
run through the tabulator with a comparing control on both 
sequence number and student numbei, with the machine set to 
tabulate. The resulting list is then checked visually to see 
that a sequence card lias been inserted at the proper place and 
that the test cards aie in proper alphabetical order. Any 
errors aie corrected. 

5b. The deck of cards used in number 4b is run through the re¬ 
producer, and the sequence numbers are punched from the 
sequence cards into the test cards. 

6b. The progiess test cards can always be placed in strict alpha¬ 
betical order raeiely by sorting them on the numerical se¬ 
quence number of four digits. 

7b. Next, placement test deciles are gang punched into the test 
cards. This is done by sorting on student numbers (both the 
placement test decile cards and the tests cards carry student 
numbers), with decile card coming first, and lunning the entire 
deck through the reproducer with a control on a suitable punch 
in the decile card. 

8b. When a progress test is given, the answer sheets are scoied on 
the test scoring machine, the answer sheets distributed, and 
percentile ranks computed and recoided on the answer sheets. 
Letter grades are recorded also, if any are assigned. 

160 



tests and examinations procedures 


9b. The answer sheets are alphabetized and checked against the 
deck of progress test cards for that course, caids are pulled for 
absentees; and both the answer sheets and tire cards are put in 
the same order to facilitate punching the test results into 
the cards 

10b. Percentile lanks (and grades, if any) are punched from the 
answer sheets into the test cards on the alphabetical printing 
punch. The punching is checked, 

11b. The cards now go to the tabulating department wliere the fol¬ 
lowing opeiations are executed: 

(1) Absentees aie re-insertcd on the collator and the se([uence 
of the cards checked on the collator. 

(2) An alphabetical list of all students is prepared on the 
tabulator showing the following data; student name and 
number, course and section, and the results of all progress 
tests to date. 

(3) An alphabetical list of students by sections is run on the 
tabulator, showing the same data as for (2), above. 

12b The lists made m (2) of lib are checked orally against the 
answer sheets as an added precaution to insure accuracy. 

13b The answer sheets and lists of results aie given to the proper 
instructor for each section. The lists are for his use; the answer 
sheets are letuined to the students. Also, the students receive 
key sheets of the correct answers, so that they may see just 
what they missed on the test. 

14b. About once a month a composite list is run on the tabulator. 
Tins list shows the progress test results for all courses lor each 
student, and through them it is possible to see how the student 
is doing in all of his courses The sequence number is used to 
put the cards in one alphabetical order, where all the cauls for 
each student are together. It has been found best to keep the 
cards in the order in which the composite lists will be run. 
When a test is given, the cards for that test only are selected 
from the entire deck. As soon as the details of handling that 
test are completed, the cards are collated back into the com¬ 
posite deck. 

15b. By the end of the school year, the cards represent a complete 
picture of the record of each student in each course. All kinds 
of statistical studies are possible from these cards, among 
which are: 

(1) Correlation between placement tests, progress tests, and 
comprehensive examination grades. 

(2) Investigation of quality of work done by those who drop 
or resign. 

(3) Correlation between grades in different courses, 

(4) Grade distributions 


161 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


C. Punch Cards and Comprehensive Examinations .—^At 
the end of the school year it is necessary to give, score, and 
report grades for all the comprehensive examinations within a 
period of less than two weeks. It has been found that the 
use of punch cards speeds the work and increases the accuiacy. 
The steps in preparing and using these cards are: 

Ic. A deck of comprehensive examination cards is made for each 
course hy reproducing the progress test cards, except that it is 
necessary to omit the first two progress test results to make 
room for the raw scores on the comprehensive. The remainder 
of the card is reproduced because the placement test results and 
the progress test results are useful as aids in setting the com¬ 
prehensive grades. 

2c. Decks of master cards for raw score intervals are prepared. 
These are used to enable the tabulator to make the distribu¬ 
tion of raw scores for each ex'amination. Decks with intervals 
of 2, 5, and 10 have been made. It has been found helpful to 
have duplicate decks to facilitate handling of courses where 
identical intervals are used These cards cany a control punch. 

3c Percentile rank master cards are prepared also An interval 
of 1 is used, and the range is from 01 to 99. It is well to have 
several sets of these and to have an abundance of cards for 
ranks below ten and above 90, since several class intervals may 
have the same percentile rank within these ranges These 
cards cany a control punch and are used to gang punch per¬ 
centile ranks into the examination cards. 

Note In all instances it is well to have the master cards with 
a corner-cut different from that of the detail cards 

4c. In handling comprehensive examinations the student number 
rather than the name is used to identify the student. This is 
done to impel sonalize the examining and to simplify the pro¬ 
cedures, because operations can be done more readily on a 
numerical than on an alphabetical basis. 

5c. These cards are used to prepare attendance lists for each 
examination room and a master-list for use in checking in the 
papers at the end of the examination. 

6c. As soon as the examination is over, the answer sheets are placed 
in numerical order and checked against the check-in rolls. This 
is done to make certain that no answer sheets have been mis¬ 
placed. A special form is filled out for each absentee and 
inserted in its proper place in the stack of answer sheets. This 
means that theie is either an answer sheet or an absentee sheet 


162 



TESTS AND EXAMINATIONS PROCEDURES 


for each name on the roll and for eacli e\annnation caid. 
This has been found to be preferable to pulling the cards foi 
the absentee. 

7c. The answer sheets aie scoied on the test-scoring machine. If 
more than the front of one answer sheet is used, all of the law 
scores are recorded on the fiont of the fust answer sheet. If 
the student has moie than one answer slieet (wliich is usually 
the case, since most of the examinations have both morning 
and afternoon sessions), great care must be used to be certain 
that all the scores on the slicet are for the same individual. 

8c. Aftei the answer sheets have been scored and checked and the 
addition of the scores completed and checked, the total raw 
scores are punched into the cards mentioned in numbei Ic. 
The punching is facilitated because both the cards and answer 
sheets are in order by student numbei and there is either an 
answer sheet or an absentee blank for each card. 

9c. From now on all operations aic mechanical except the checking 
of the punching of the total scores. After the checking, the 
cards are soiled on total scoie, with the interval cards being 
placed fiist in the hopper. This sotting places the cards in 
order by score, with the interval cards coming at the proper 
place. It is well to check the sequence of tlie card.s on the 
collator, 

10c. The cards are now tabulated with a contiol being taken on 
the ll- 2 one punch in the inteival cards (the detail cards being 
blank in the control column). The tabulator will print the 
intervals from the interval caids, count the detail cards be¬ 
tween interval cards, and record the fiequcncies and the pro¬ 
gressive totals. To insuie that the count foi each interval is 
placed on the same line with the intcivahs, only tlie upper huh 
of the comparing contiol must be used (i.e , there is no plug¬ 
ging to the add-hub of the control souice), so that a control 
change will occur only when going from blank to X-punch, 
but there will be no change when going from X-punch to 
blank. 

11c. The percentile ranks are computed fiom the progressive totals, 
and the proper peicentile rank card (selected from the pre¬ 
pared deck of percentile lank cards) is inserted manually just 
back of the interval card with which that percentile rank goes. 

12c The cards are re-tabulated with both the interval cards and 
the percentile lank cards m the deck. In this operation the 
percentile rank caul is handled as a detail caid (i.e., it is blank 
in the control column), but it cai ries another X-punch to keep 
the tabulator from counting it. By taking the percentile rank 
through a counter, it is possible to print the ranks on the same 
line with the interval, the frequency, and the progressive total. 
Both interval cards and detail cards are blank in the percentile 

163 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


rank column, so that all the counter receives is the percentile 
rank. This ie-run serves as an accuiate check on the original 
distribution and the insertion of the percentile ranks at the 
proper place. The first step of the check is to compare the 
second distribution with the first to check frequencies and pro¬ 
gressive totals. The inseition of the percentile rank cards can 
be checked by comparing the machine-recorded ranks with those 
obtained in the computation on the first distribution. 

13c. The complete deck of caids is run through the reproducer to 
gang punch the peicentile ranks into the detail cards. It is not 
necessary to icmove the interval cards before this operation if 
both the interval cards and the percentile rank caids contain 
a common X-punch which can be used to cleai the punch 
magnets It should be recalled that the percentile rank cards 
were inserted behind the interval cards. After the punching 
has been finished and sight-checked, both the interval cards 
and the percentile tank cards can be separated from the deck 
by using the soiter. Then the peicentile ranks are interpreted, 

14c. The deck of comprehensive examination cards is left in order 
by score and peicentile rank until after the grades arc set. 
Since the cards contain the placement test results, most of the 
progress test results, and the score and rank on the compre¬ 
hensive examination, the infoimation they reveal helps in set¬ 
ting the grades. Also, the distribution of raw scores and the 
specified answers given to certain key questions are used in set¬ 
ting the grades. For example, in setting the grades someone 
may wonder what kind of persons we find at the 10th per¬ 
centile from the bottom. By referring to the comprehensive 
examination cards, we can learn the quality of their placement 
test results and their relative success on progress tests, and by 
referring to the answer sheets we can see which questions they 
missed and which they answered correctly If we find that 
most of the students at this percentile are missing items based 
upon elementary facts and principles, we feel that we cannot 
pass persons at that level. A higher level can be investigated 
until a satisfactory one is found. Such a procedure can be used 
until all of the division points for the various grades have 
been set. 

15c. After the grades have been set, the grades are gang punched 
into the detail cards and the grades interpreted. 

16c. Next the detail cards are alphabetized on the sorter by sort¬ 
ing on sequence number, and grade reports are printed by the 
tabulator on specially prepared forms, so that copies of the 
grades can be tepoited by the Registrar and to others con¬ 
cerned. 


164 



TESTS AND EXAMINATIONS PROCEDURES 


I7c. Since the grade cards contain so much pertinent infoimation 
legarding each student’s academic record during the year, 
many statistical studies can be made from them. 

D. Punch Caids and the Library of Test Items .—An 
item analysis is made of the items used on each progress test 
and comprehensive examination In making the analysis, 100 
answer sheets uniformly distributed throughout the highest 
and lowest quarters are selected as a sample. (If the number 
of examinees is less than 400, samples of 50 may be used, or 
the upper and lower halves may be used instead of quarters.) 
The analysis is made on the test-scoring machine by utilizing 
the graphic item counter, and the count is made on the cor¬ 
rect responses. For the items found to have low discriminat¬ 
ing value the frequency counts for each distractor are made, 
unless the low validity is obviously due to excessive ease or 
difficulty. Four measures are secured for each item; V, the 
validity or discriminating power, which is the tetrachoric co¬ 
efficient of correlation between the item and the total test; 
D, the difficulty, expressed as per cent of the group answering 
the item correctly: H, the per cent of the highest quarter or 
upper half answering the item correctly; and L, the per cent 
of the lowest quarter or lower half answering the item 
correctly. 

An outline is made of the course content of each compre¬ 
hensive course with major, intermediate, and minor classifica¬ 
tion of topics, and on the basis of this outline each test item 
is classified according to the aspect of the course which it 
covers. This classification and the item analysis data are all 
punched into cards, and the statement of each item is typed 
on the back of the card. This provides a very complete and 
usable library of test items. The actual steps in preparing 
the cards for this library are: 

Id. Pie-punching a deck of cards to show the course and the date 
the test was given. 

2d. Punching the validity data (V, D, H, and L) and the course 
content into the cards prepaied in Id. One ffigit is used for 
each validity datum, i.e., a validity correlation coefficient of 
.30-.39 is written as 3 and for the per cents for D, H, and L 

165 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the units value is chopped; for example, a per cent of 60-69 is 
punched as 6. This is done to conserve columns on the cards, 
(Of course, this pioceduie will mean that the punched values 
will average about ,05 or 5 lower than the computed values,) 

3d. The test item involved is typed on the back of the card which 
carries the corresponding item analysis data. 

4d. The typing is proofed, the correct answer indicated, and the 
item is filed according to test content. 

The information which is punched into these cards makes 
it possible to select mechanically items according to validity, 
difficulty, and course content and to make various types of 
statistical studies. 


166 



MACHINES IN CIVIL SERVICE TESTING 


SIDNEY W. KORAN 

Employment' Board, Pennsylvania Depaitraent of Public Assistance 


EDITOR’S NOTE: This aitide is an abridged version 
of a lectuie recently presented by the author before an in-service 
tiaining class comprising staff members of the examination divi¬ 
sion of a large state civil seivice agency. It is offered here because 
it pulls together, for what is probably the first time, numerous 
loose ends of the important body of knowledge that is beginning 
to come into existence on the mechanization of civil service exam¬ 
ining processes The article comprises a description of the 
purpose, design and operation of the I.B.M. scoring machine, 
a discussion of the limitations of the scoring machine in connec¬ 
tion with the conduct of examinations, infoimation on adapting 
tests to machine scoring, descriptions of pioceduies for scoring 
tests using the I.B.M. and other machines, information on scoring 
various types of rating scales by machine, matciial on the uses of 
the scoiing machine in item analysis and in the computation of 
several statistical measures, and a summary of the place of tabu¬ 
lating equipment in the conduct of certain examination tasks. 

The I.B.M. Scoring Machine 

The LB M. scoring machine was designed to meet a very 
real and pressing need in the field of educational and psycho¬ 
logical testing for a method of scoring objective tests that 
would combine speed, accuracy, and low cost to an extent con¬ 
siderably beyond that possible with any technique previously 
developed. The recognition of that need was sufficiently great 
to stimulate, over a period of years, the development of 
numerous ingenious techniques as well as several different types 
of automatic and semi-automatic devices. The one, however, 
which appears to have been the most successful and to have 
earned the widest application to civil service use is the com¬ 
paratively new I.B.M. scoring machine. 

167 



EDIICATIONAI. AND PSVCHOUKilCM, MEASUREMENT 


The operation of this machine is dependent upon the prin¬ 
ciple that the mark made by a pencil having a soft lead will 
conduct electricity In ortlci to score a test by the machine 
it must be designed so that the examinee may indicate his an¬ 
swers to the (jucstions by placing pencil marks in certain pre¬ 
determined anil properly labeled positions on a sheet of paper 
which is either separate from live test booklet oi may later be 
separated from it. h’or consistently satisfactory lesults it is 
advisable to furnish examinees with mechanical pencils 
equipped with special high-graphitc-content leads and to 
use answer sheets that have been c.irefully and accurately 
printed so that the location of each one of the 750 possible 
response positions on cither side of the sheet corresponds 
within fairly close limits to the location of each of the 750 
sets of contacts within the machine. 

Each of these sets of contacts consists of live small parallel 
blades insulated from one another and connected alternately to 
the positive and negative sides of the electrical circuit, the cur¬ 
rent for which IS furnished by several conventional radio "B" 
batteries. When an answer sheet is inserted in the machine 
for scoring, it is pressed against a plate containing the 750 sets 
of contacts. Whenever one of these sets of contacts presses 
against a pencil nvark, the latter, since it is a conductor, per¬ 
mits current to flow across one or more of the four gaps made 
by the five parallel blades. The length of the pencil mark 
determines whether one, two, three, or four of these gaps will 
be bridged. If the examinee follows instructions and makes his 
pencil mark sufficiently long, it will bridge all four of the gaps. 
By designing the machine so that there are several millions of 
ohms of resistance in series with each set of contacts, current 
differences which result when some of the examinee’s marks 
are not long enough to bridge all four gaps are minimized suf¬ 
ficiently to prevent their having an appreciable effect on the 
score, 

The resistance in each circuit is such that when the appro¬ 
priate rheostats have been adjusted properly, a single unit of 

168 



CIVIL SERVICE TESTING 


current is registered for each set of contacts that may be 
pressed against a pencil mark. Scores are read on a meter 
which has been calibrated in terms of these units. The use of 
switches and other accessories makes it possible to read rights, 
wrongs, omits, rights minus wrongs, rights plus wrongs, rights 
minus or plus a fraction of wrongs, etc. Whether any given 
choice, or answer position, will be counted as right or as wrong, 
or whether it will be eliminated from the scoring altogether, 
is determined by the manner in which holes have been punched 
in the set of keys inserted In the scoring rack. By the use of 
switches and the proper perforation of field selection holes in 
the scoring key, the machine may be adjusted so that the meter 
will read the score for all of the items on the answer sheet 
or for certain combinations of the ten 15-item fields, or both. 


Machine-Scorable Answer Sheets 

Standard answer sheets designed to fit several types of 
general situations are available from the manufacturer of the 
machine, and it is also possible to have special answer sheets, 
designed to meet certain specific requirements, printed to older. 
Unless ordered in fairly large quantities, however, special 
answer sheets are usually more expensive than standard answer 
sheets and their use frequently introduces an additional time 
element into the planning of examinations. 

Some of the agencies, in an effort to take advantage of 
the economies afforded by quantity purchases and to enjoy 
the convenience of having their own stock on hand, have 
standardized their major requirements sufficiently to permit 
them to order relatively large quantities of three or four types 
of answer sheets printed only with the name of the agency, 
the item numbers, and the response positions. As new exam¬ 
inations come up, these agencies select the type of answer sheet 
which most nearly fits the requirements of the particular situa¬ 
tion and print or multilith, in the left-hand margin, whatever 
additional identifying material is required. 


169 



educational and psychological measurement 


Limitations Imposed by Machine Scoring 

To the individual who is about to construct a test that is 
to be scored by machine the limitations imposed by the machine 
method are chiefly three; 1. A separate answer sheet must 
ordinarily be used. 2. The response to each question must be 
indicated by making a special kind of pencil mark 3. The 
orientation of the response positions on the answer sheet can¬ 
not be altered. 

The first of these considerations, that of the use of a 
separate answer sheet, is more correctly a “condition” rather 
than a “limitation,” for even when the machine method of 
scoring is not involved it is usually desirable, when examina¬ 
tions comprising large numbers of items are to be administered 
to any considerable number of individuals, to make use of 
some form of separate answer sheet in order to facilitate the 
scoring process. This is true whether it is planned to do the 
scoring entirely by hand or by some combination of hand- 
scoring and overprinting (with a multilith, for example). 

Another reason why the use of a special answer sheet is 
not ordinarily a serious obstacle is that it is frequently possible, 
if necessary, to design the examination so that the test ques- 
tions are printed directly on the answer sheet beside (or di¬ 
rectly over or under) the response positions. This has been 
done in the case of a number of standardized educational and 
psychological tests and has also found some, though much more 
limited, use in connection with civil service examinations. While 
this procedure seems to be particularly advantageous when 
used with one- or two-page personality inventories and, as will 
later be pointed out, certain types of rating scales, there are 
ordinarily several objections to its routine use in setting up civil 
service examinations. Among the principal objections are the 
increased trouble and expense caused both by the special print¬ 
ing requirements and by the fact that the relatively small num¬ 
ber of items which may be printed on a letter-size sheet usually 
necessitates using several answer sheets for a test of any ap¬ 
preciable length. The handling, scoring, and computational 
difficulties encountered whenever a single test requires more 

170 



CIVIL SERVICE TESTING 


than both sides of one answer sheet are ordinarily sufficient 
to discourage that practice. There are, however, certain situa¬ 
tions in which the mere fact that several answer sheets will be 
required for a given test may well be a matter of relatively 
minor importance when considered beside the larger aims of 
the examination. 

The second limitation imposed by the use of the scoring 
machine method, that of the necessity for making a special 
kind of pencil mark to Indicate the answer to each question, is 
apparently proving less of a problem than many expected it 
would. As returns come in on the results of research (1) and 
on the development of improved scoring procedures which 
have reduced the likelihood of scoring errors to a negligible 
figure (3, 11), it is clear that whatever problem is actually 
presented by this limitation may be pretty adequately neutral¬ 
ized by taking the following three steps: 


1. Include in the examination announcement n section comprising 
(a) an explanation of the types of questions that will be used, (b) a 
statement of the fact that the test will be scmed by a machine which 
will provide the correct score if the examinee follows all instructions 
carefully, (c) a list of the lules which must be followed in indicating 
the answers, and (d) a group of sample questions printed alongside a 
specimen portion of the answer sheet on which the answers to some of 
the sample questions have been properly maiked and the remainder left 
for the examinee to complete as a piactice exercise.^ 

2. Provide the examinee, at the time the examination is admin¬ 
istered, with a page of instructions, similar to those described above 
which he IS given time to review sufficiently long to lefresh his memory 
and to which he may refer at any time during the examination. (One 
example of such an approach is the Directions for Using the Answer 
liAeei, reproduced on the followtng pages, which has been used in Penn- 
sylvania since 1939 by the Employment Board of the Department of 
Public Assistance with examinations conducted for approximately 115,- 
U(JU persons.) 


3. Furnish, throughout the test, additional adequate and clear direc¬ 
tions including, wherever a somewhat different approach is employed a 
sample question properly answered (7). (Several illustrations of srlch 
pecial directions appear among the examples of test material presented 
later m this article.) 


the page entitled Sample Questions for 

f General Test which appears as part of current U, S. Civil Service Com- 

wnuen examinations which will include machme-scorTd 


171 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


DIRECTIONS FOR USING THE ANSWER SHEET 

All of the answers in the test you arc about to take are to be recorded 
on special ANSWER SHEETS instead of in the Question Booklet. To 
leceive a edit for you? answeis they must be recorded in the proper 
spaces on your ANSWER SHEET. 

ANSWER SHEETS will be scored by an electiical test-scoring ma¬ 
chine. In order for your test to be scored accurately, it is necessary for 
you to observe the following directions caiefully; 

1. Read each question and Us numbered answers and decide which answer is 
correct. 

2. Find the pair of dotted lines numbered the same as the answer you have 
chosen as being coirect, and blacken this space with your pencil. Be suie 
that the space you mark is in the ro<ta numbered the same as the question yon 
are answering Misplaced answers are counted as wrong answers. 

3. Indicate each of your answers with a vertical solid black pencil mark, Solid 
black marks are made by going over each mark two oi three times and by 
pressing firmly on your pencil. 

+. If you change your mind, erase your fiist mark completely, then mark the 
collect space. Blacken one space o«/y for each question numbei, 

5. Do not rest the point of youi pencil on the ANSWER SHEET while you 
aie considering your answei and do not make unnecessary masks 

6. Keep your ANSWER SHEET on a hard surface while marking youi 
answers. 

7 Make your marks as long as the pair of dotted lines. 

Below are some sample questions to give you piactlce in using the ANSWER 
SHEET. The questions at the left are similar to those you will find In your 
Question Booklet. At the right is an illustration of a portion of an ANSWER 
SHEET. The answeis to the first four questions have already been marked on 
the ANSWER SHEET. Study the questions and note the way the answers to 
them have been maiked on the ANSWER SHEET. Then answer each of the 
remaining questions in exactly the same way; that is, by making a heavy black 
mark on the ANSWER SHEET in the space numbered the same as the correct 
answer. 

Questions for Practice 

1. The third month of the year is: (1) 

Februaiy, (2) March; (3) January. 

2. The capital of Pennsylvania is: (1) Har¬ 
risburg; (2) Albany; (3) Boston. 

3. The Governor of Pennsylvania is: (1) 

John Garner; (2) Alfred Landon; (3) 

Arthur James. 

4. If one pencil costs one cent, five pencils 
will cost; (1) three cents; (2) six cents; 

(3) four cents; (4) two cents; (5) five 
cents. 

5. George Washington was the first Piesi- 
dent of the United States. (1) True. 

(2) False. 


I 2 1 4 S 

lil I 11 >1 

1 2 a 4 & 

2l !l l! il I! 

12 3 4 6 

3ll II I II II 

t 2 3 4 B 

4ll II II II I 

I 2 3 4 B 

sll I! I! 11 11 

12 3 4 6 

eii li II i| il 

1 2 3 4 5 

7 II 11 i| II II 

1 2 3 4 6 

all ;i :i II II 


172 




CIVIL SERVICE TESTING 


6 Every calendar yeai has: (1) eleven 
months; (2) ten months; (3) twelve 
months. 

7. The fuel most commonly used in automo¬ 
biles is: (1) kerosene; (2) caihona; (3) 
crude oil; (4) gasoline. 

8 The sum of six and four is: (1) five; 

(2) six; (3) eight; (4) nine; (5) ten. 

NOTE. The Answer Sheet provides sp.ices for rccordiiiK five different 
choices for each question Some of the questions in your examination booklet 
may contain only two, or three, or four choices. When answering a question 
containing fewei than five choices, you aie to ignore the acklitionnl spaces 
printed on the Answer Sheet for that question. 

The last of the three limitations mentioned as imposed by 
the machine method of scoring has to do with the fact that 
the relative location of each of the 7SO response positions on 
the answer sheet is fixed and may not be changed by the test 
constructor. It is in meeting this difficulty that the test techni¬ 
cian's ingenuity enters the picture. 

Despite the handicaps of perpetually imminent deadlines 
and Lindeistaffed examination units—two well-known character¬ 
istics of the conditions under which many civil service commis¬ 
sions work—a sufficient number and variety of adaptations of 
test material to this particular limitation of the machine 
method have been produced In the short space of a few years 
to warrant the conclusion that the use of separate answer 
sheets involving fixed response positions offers no particularly 
serious obstacle to the construction of objective test material. 
In fact, what used to be a double bottleneck—construction and 
scoring—^has, because of the advantages offered by the scor¬ 
ing machine, been reduced to but a single obstruction. Objec¬ 
tive tests may now be scored so cheaply that the major remain¬ 
ing obstacle to the wider use of belter objective tests appears 
to be the difficulty of constructing them under the usually pre¬ 
vailing conditions of insufficient time and the not-too-ready 
availability of trained examination technicians. 

Constructing Items for Machine Scoring 
In general, the basic rules to be followed and the pitfalls 
to be avoided in the construction of good objective-type items 

173 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


are as applicable to items that are to be scored by machine as 
they are to items that are to be answered either directly in the 
test booklet or on a separate sheet designed to be scored man¬ 
ually. An item that is "tricky” or that contains an ambiguous 
or ludicrous statement is unacceptable for reasons quite apart 
from the scoring method that will be employed. Whde there 
are certain additional points that must be kept in mind—mostly 
with regard to adequate Instructions to the examinee and strict 
adherence both to the physical limitations of the answer sheet 
and the electrical limitations of the machine itself—funda¬ 
mentally, an item that is unsatisfactory for any reason that 
would interefere with Its suitability as an ordinary objective- 
type question is equally unsatisfactory for use in a machine- 
scorable test. Plere, however, are some of the considerations 
which appear to be sufficiently peculiar to the use of machine- 
scorable answer sheets to warrant enumeration and brief dis¬ 
cussion : 

1. Choice of answer sheets. Wherever practicable, test 
items should be designed to make use of standard answer 
sheet forms. Doing so keeps down construction time as well as 
costs and obviates the necessity foi presenting special instruc¬ 
tions not covered by the general directions printed in the an¬ 
nouncement and furnished the examinee at the time of the 
examination. The use of standard answer sheets possesses the 
additional advantage of capitalizing on the fact that, since 
machine-scoied tests are being used more and more widely by 
civd service agencies and educational institutions, an increas¬ 
ingly large proportion of the civil service test-taking popula¬ 
tion may be expected to have sufficient previous experience with 
standard forms of separate answer sheets to permit them to 
concentrate on the test material with a minimum of distraction 
and tension. 

2. Adequate instructions. Instructions for indicating an¬ 
swers to such specialized subtests as those involving alpha¬ 
betizing, proof-reading, checking, sorting, filing, punctuation, 
and similar tasks should be adequate and should preferably 
include a sample exercise properly answered. In writing these 

174 



CIVIL SERVICE TESTING 


directions the kind of language, specificity, and need for 
examples will, of course, vary according to the level of the 
job for which the examination Is being designed, In general, 
however, it is safer to be somewhat too specific and to provide 
what, to the sophisticated test-taker or Ph.D. test constructor, 
may sometimes appear to be an unnecessary example, than to 
take too much for granted on the part of the examinee. 

It is sometimes argued that an examinee who can’t follow 
such simple instructions “doesn’t deserve to pass’’ or “couldn’t 
do the job anyway.’’ While there are certainly times when this 
stand appears justifiable, the writer’s opinion is that it is always 
safer to provide, if for no other reason than the maintenance 
of satisfactory public relations, directions that meet the high¬ 
est standards of adequacy. If it is desired to test the exam¬ 
inee’s ability to follow instructions, one should use a test 
designed to do just that, rather than take the chance of meas¬ 
uring such a trait by using “complicated” (in this case, a 
euphemism for “inadequate” or “unsatisfactory”) instructions 
which are likely to result in a distribution of scores unduly 
influenced either by the degree of an examinee’s previous 
experience with new types of tests or by the extent to which he 
exhibits caution in situations of this kind. 

In this connection it sometimes occuis, In construction of 
a test for a position such as building superintendent, that by 
the time the test constructor has finished adapting a certain 
bit of practical material to the limitations of the answer sheet, 
his product, regardless of how cleverly worked out, is no 
longer suitable for that particular level of job. What he has 
may be an excellent test of Intelligence, but of a higher level 
than that required of a building superintendent. It hurts, 
sometimes, to have to discard or extensively modify a brain 
child of that kind, but it has to be done. 

3. Item sequence. The sequence of items in the test should 
be such that subtests or item-groups that are to be weighted 
differently from other portions of the test or for which a 
separate score will be desired, will fall entirely within one or 
more fields. If, for example, in a test for key punch operators 

175 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


a separate score will be required for a 30-item subtest on the 
subject of coding, the items for the entire test of which the 
coding items are a part should be arranged so that the item 
numbers (if a standard answer sheet is used) will start with 
1, or 16, or 31, or 46, etc. When the test is being scored it will 
then be possible, if the proper field selection holes have been 
punched in the answer key, to read the score of the subtest 
with the expenditure of no more additional effort than is re¬ 
quired for turning a knob while the answer sheet is in the 
machine. Similar treatment should be accorded item groups 
to which a correction formula is to be applied that is different 
from that used for any other part or parts of the entire test. 

4. Reducing response errots. Care should be taken to 
avoid wording questions and selecting styles of type or print 
that are likely to cause the examinee to make unnecessary 
clerical errors in recording his responses. This not infrequently 
occurs, for example, when Arabic numerals having the same 
range as those used to denote response positions are used in 
the answer. Answer sheets are available which eliminate this 
difficulty by using the letters A, B, C, D, and E to designate 
the response positions Where the response positions are num¬ 
bered, however, it is frequently helpful simply to spell out the 
numbers from 1 to 5 when they appear alone or almost alone 
in the answer. Two simple illustrations of this point are; 

“The number of inches in one-third of a foot is; (1) 
two; (2) three; (3) four; (4) five; (5) six." instead of. 
“(1) 2; (2) 3; (3) 4; (4) 5; (5) 6.” 

“How many persons in the family are eligible to re¬ 
ceive some form of assistance? (1) none; (2) one; (3) 
two; (4) three; (5) four.” ins'teadof:"{\) 0; (2) 1; 
(3) 2; (4) 3; (5) 4.” 

S. JuxtaposiUon of instructwns and related items. Where 
use is made of a page of instructions including a key, legend, 
code, or similar device likely to have to be refen ed to fre¬ 
quently by the examinee in order to answer a given group of 
related questions, the format of the examination booklet 
should be such that the page containing the instructions will 

176 



CIVIL SERVICE TESTING 


face at least a group of the questions. Among the possible 
exceptions to this rule is the situation in which one of the 
functions being tested is the ability to memorize certain ma¬ 
terial or relationships, and in which the test is being timed in 
an effort to obtain a measure of the examinee’s ability to do so. 

6. Completion arithmetic items. The construction of 
choices for items involving arithmetic, algebraic, or statistical 
problems, or consulting a giaph or chart, is no different when 
the item is to be scored by machine than by any other method. 
There may, however, be situations in which it is considered 
important, in connection with a certain group of items in a 
test, to know the exact answer arrived at by the examinee as 
a result of his calculations When this is so, it is possible by 
expending some additional time and effort, to retain the ad¬ 
vantages of the completion type test for this particular group 
of questions and at the same time have the machine-indicated 
score represent the examinee’s achievement in the entire test, 
This may be accomplished by designing the answer sheet so 
that spaces are provided both for the examinee to write in 
his answers to the questions and for a clerk to indicate the 
correctness of those answers by making the usual kind of pencil 
marks in response positions especially provided for that 
purpose. 

Two disadvantages of this approach are the need for 
special answer sheets and the time and expense involved in 
having the completion items scored manually before the test 
as a whole can be scored by machine . Another possible dis¬ 
advantage is the effect that the use of two types of directions 
may have on the examinee’s adherence to the important and 
oft-repeated general instructions to "indicate your answer to 
each question by making a heavy mark in the appropriate 
space on the answer sheet,” This is of some importance, for 
the extent to which the examinee can be persuaded to accept the 
idea of making proper marks instead of writing answers or 
numbers on the separate sheet is one measure of the amount of 
machine vs. handscoring that will have to be done. For these 
reasons, and because it is probably possible to use the multiple- 

177 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


choice form of presentation for arithmetic and similar items 
without interfering seriously with their validity, it would seem 
preferable, ordinarily, to avoid mixing the two types of re¬ 
sponses in the same test. 

Beginning on the next page are examples of Test Mate¬ 
rial Adapted to Machine Scoring.^ 

Scoring Civil Service Tests with the I.B.M. Machine: 

Historical Note 

Civil service commissions, while recognizing the speed and 
economy features of the machine-scoring method of rating 
objective tests, were at first quite reluctant to put the machine 
to use. Springing in part, probably, from the usual resistance 
to adopting methods and procedures differing from those 
already in use, the criticism was made that not only did the 
machine method involve the use of special materials to which 
the test-taking public might object, but the scores which it 
turned out were insufficiently accurate. 

This controversy was quietly running its couise when an 
event occurred in the field of public personnel administration 
that resulted, among many other things, in removing the whole 
question from the talking to the trying stage. Suddenly, all 
over the country, sizeable civil service agencies were coming 
into existence in accordance with the merit system provisions 
of the Social Security legislation. Many of these new com¬ 
missions were faced with the task of examining unprecedented 
numbers of persons within time and budgetary limitations that 
were not easy to meet, and some were composed of commis¬ 
sioners and administrators who were sufficiently new to civil 
service problems to be relatively quite receptive to such new- 


2 These examples of machine-scorable subtests and item-gioups are offeied 
solely for the purpose of illustrating the vauety of test material that may be 
adapted to machine scoring and the kinds of instructions that may be employed. 
The writer wishes to thank the Employment Board of the Pennsylvania Depait- 
ment of Public Assistance and Miss Hilda P. Thompson, Executive Director, 
for their kind permission to use this material, which was developed for the 
Board over a period of several yeais by C R Adams, G. K. Bennett, S. W. 
Koran, B. V. Moore, E A Rundquist, and K S Wagonei. C. H. Smeltzer 
and M S Viteles were employed as consultants to the Boaid during the peiiod 
this material was being developed. 


178 



CIVIL SERVICE TESTING 


NUMBER AND. NAME CHECKING 


In each of the foUoirlng groupn, sono oi the polro of niuaeg 
and numbora ore bmcUy the Bnme »liiie othare are dUferent- 

You firo to clieck) on th« connaotljig llna, tha pairs which sra 
different and indioate on your SHEET the Utal rmaber of such 

pairs in each group. 


John Sgiltli"" — — 
Hhi. G# Porna— 
ThoBa Doe tmd Co 
Bart Salt Corp.- 
Dryant, Mitcboll- 


— 7 —John smith 
—Wn. 0 . Burns 
.-Jt-^Thoa. Dob Co. 

-7—Burt salt Corp. 

——Bryant, llitchal 


In the example above, three pairs are dlfforont so you are to 
make a heavy mark in space number 3 opposite question Ho. 61 on 
your ANSireR SHEET, thuai 

I « I 4 • 

61 -I I' I i| 1! 

Qe sura you have m&rkod No* 61 on your AHS'iVEn S}JP.S'r, tlion go 
on to ?Jo. t>2. Woik as fast as you con wLUwut naldng lalotakeD. 


62. Auto Service Shop— 
Ales tar tc McAlooter— 
Van & Van Ue Vyoro—■ 

Raymond Protschold- 

Kaaaonitach Storage— 


—“Auto SorvLco Shop 

-•Aluflter L UcAlootar 

-Van A: VoJi do Vyero 

——Raymond Protachold 
—Kassanitflch Storage 


63• Paul A* Anderaon- 


-?aul A. AndOjtflA*'- 



ALPHABETIZING 


nearrange the nazoee la eeob of the followlog groups In the order In 
whloh they would appear In an alpbabetlo file. 

List tbe names in alphabetlo order Id the blanks at the right. The 
number in parenthaaes after eeob naiaa Duet be kept with that heme during 
the alphabetizing. When you have arranged the aai^B in the blanks, indl* 
cate the alphabetized order by making a heavy mark on your AH9WBI SHEffI 
In the appropriately numbered space opposite tbe proper (lueatlon number. 

7or example , whan you alpoaoetize the oomeB in the flrat group below, 
you will find that tbe name followed by tbs number 3 In parentbsaes belongs 
after Ko. 46. You will therefore make Q heavy mark In apnoe number 3 alter 
question No* 46 on your ANSWER SRHIET* thus: 

I > j 4 v 

■*0 f| II I 11 II 


Alfred Anthony 

U) 

46, 

Estelle Antbony 

Hi) 


A. L. Anthony 

(3) 

47, 

finlly U. Anthony 

U) 


Ulss Birdie Anthony 

(5) 

46. 



49. 











EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


NUMBER, NAME. AND ARITHMETIC CHECKING 

)(l iVi nrc - nils lU.TtL GiV' ‘J iJit SIOHAL BY TOETroCTOR ^ 

Ihu ■‘.'If eltrenbinng ve^y cftr^“fullyi 

The inside pugoij of Uvio booklet contain three toata vrhich will be timed. 

The rirst test coneista of pairs of numbers mid the second test of pairs 
of nBmen 4 If the numbers or tho ttomus of a pair aro exactly the saae ^ 
make a hi^nvy m.rk in spseo number 1 on your AIISiVEll SIIEET boside the corre- 
3 [.ondlnE nunbor of that pair. If they aro different ^ piako a hoavy mark,ln 
space number 2 on your ANSVTQI SHEET. 

The third test conaiats of alniplo arithmetic calculations. If a problem 
la correct , make a heflyy mark in apoce number 1 on your ANSVER SHEET beside 
the corresponding number of that problem* If It la not correct , make a 
heavy mark in apace number 2 . 

SAMPLES DOKE CQWIECTLY 

Cuestion Booklet Answer Sheet 

1, 453 - 435 1 J I 

а. 6125-6125 21 || 

I X 

3, tfilliajTi Johnson-'William Johnson 3 | , 

I X 

4, Abraham and Link Co,-Abraham and Lind Co. 4 !| | 

I i 

5 4 + Q » 12 51. 

б. 3x6x15 el i 

K0« DO THE SAMPLES BEUM 

I I 

326.326 7il' il 

8 . 7419-7fiU 8<1 II 

, * » 

9, Samuel Dillon—SaiBual. Dillon 9 j 1 j| 

I I 

10« Llarkwell and Gordon—Uarkwall and Gordon fOli II 

n. Ux 5 "70 II { II 

12. 12 4 6 m 3 il 

flh«a«ver the prootor «aya ^ fitop i" STOP TOHX and llateo oarefully for 
further ioitruotionif 

FAILURE TO FOLLOW INSTRUCTIONS EXACTLY MAI UWER YOUR SCORE IN 
THE EXAMINATION. 

DO NOT OPEN THIS BOOKLET UNTIL GIVEN THE SIONAL BY THE HIOCTOR « 


Make a mark in space number 1 if the numbers are exactly ^ >.0 Ham**. 
Make a mark in space number 2 If the numbers are different. 



Make a mark in space number 1 If the nones are exactly the sane * 
Make a mark in space number 2 If tho nsmes are different * 

1* John L, Frankson—-—John L* Frankaan 

2. Orerholt Tobacco Co,--—Overholt Tobacee Co, 


Uake a mark In space number 1 If the caloulatlm Is correct * 
Maks a mark in space number 2 if the calculation ia In.oorrapt* 















CIVIL SERVICE TESTING 


ALPHABETIZfNG AMD SORTiNG 


Vou &re to det 0 mliie the 
1 te the hrbi bii of the Junior 
ebetlflel order in the apeoee 
ttore eaeh recoiTodr U&l^e 


name belo* represent* an addreeand letter, 
k-r’^r addressed to oaoh pereoti. Plrat, wr, 

-or/Senior Vl.ltora .nd th. Int.rTl.».rt In .Iphi 
iLvtilBd Then tsbulito on th. ap.oa. proyld.d tho In 

h mark on tho ANSV.B! SHEET to Indlo.t. thin numbor. For .laaiplo. If lh« 

“ r”n7r80«lTad 3 lott.ra malt, a hoa»y mark In apac. 3 baaldo th« numbar nf that 
I>,a an tba AHSlfEn SHEET. Thua. alnoa JnanlU Bataa la tha nama Phloh will ba 
flr>t ahen tho Junior Vlaltora aro arranjod In alphabotloal nrdor, thla uamn la 
irlttoo at tha top of tha Junior Vialtor Hat., Tallying will alma that aha 
raoalred tiro lattara. Hanna you will Malta a haavy mark In apana nunbor 2 baaldo 
Question No. 61 on the ANSWER SUEETa 


Kane 

Position 


Alberta Cumnine 

Junior Vieiter 


ll/roda Swift 

Junior Visitor 


gelen Cuehaon 

Senior Visitor 


Rita Bamaa 

Junior Vialtor 

61. 

Alberta Swift 

Senior Tlsltor 

Kary Patrey 

Senior Vialtor 

6Z. 

Eleanors Petry 

Junior Visitor 

Roae Dbwman 

Senior Visitor 


Alberta Swift 

Settlor Visitor 

63. 

Juanita Batea 

Junior Visitor 


Jenny Bette 

Senior Visitor 

6i. 

Jeanne Qolten 

Interviewer 


Alberta Cuanlns 

Junior Visitor 

65. 

Nary Beolc 

Interviewer 


Mary Patrey 

Senior Visitor 


SUthryn Snow 

Interviewer 


Vh> peiTtnan 

_ Junior Vialtor _ 



Junior 71eltor 


Speoe for 

TaHyitL/; 


SORTING 


Thie Is a sortixi^ test. The oiiy in wldeb each person Ilres 
is represented by a aode s/Kbol. Detcralne the nuiaber of persona 
Urlng in eaoh olty. 

Use tba bluiko proridad in the code Hat for purposes of 
counting. If the code evnbol for a city le not listed, count it 
M UleceUaneoua (Ho. 30} • When you hare flni^ehed sorting the nases, 
count the najd>er living In each city And Indicate the total for each 
city by naldiig a heavy mark in the proper apace on your AhSWIK SHEET. 

For exAB^le. if it la found tbmt four persona live in a city 
vhoae code eyvbgl Is you vould make a mark in spoce number 4 

beside the quaetlon cumber of that oode symbol, tbusi i i , , • 






u 

II II r II 

CODE 

UST 

APPLICANT'S 

CCCE FOR 

APPLICANT'S 

CODE FOR 


NUCF. 

rm 

NAMR 

CIW 

16. 

AB-? 

Noore 

RB-U 

Grey 

JR-3 

17. 

IK-A 

Evans 

SS-4 

Cooper 

FS-1 



Poster 

UB-4 

Force 

HB-7 

IS. 

ER-5 

Brown 

Janos_ 

HE-7 

PT-2 

Burtt 

PT-2 

flfl-e 





UuTphy 

vnf-5^^ 

24. 

PT-2 

Wells 


Kahn 

VB-0 


Fink 

ICL-9 

Rolfe 

QR-1 

25. 

QR-1 

Sporn 

QR-1 

Borg 

Kug 

Call 

WY-5 

Hansen 

FS-1 

26 , 

RB-B 

Swift 

S3-4 

Rees 

Rfi-« 



Harris 

liB-6 

Roth 

lW-7 

27. 

5S-A 

Uoaes 

ER-5 

Prince 

lIB-6 



Brand 

FS-1 

Creel 

5S-4 

26. 

VD-e 

Uoon 

AK-4 

lulo 

PT-2 

29. 

WT-4 





Ksi 

lllBceU.aneoda 








182 





DIRECTI0K6 FOR ARRffERiNO QCESriOKS 121 to 146 INCLUSiVBi 
E«oh pAragraph b4lav inalud«c on< word which spoils ths 
Tn«AiilQg or ths paraerap^). This inoorraot word is one at 
tha fire words which harw nisibaro printed Just nboTo thortp 
Tfhati you hard FauBd tha inoorreot word, raalcs a heavy d&tIc 
on your ANSTCR SMEIT in tha spsoa haring the saise number 
as the Icoorraot word. 

1 Z 

l21. Our prasant dafsat hy tha maohlnery around us Is a parmanent 
3 4 6 

tfaldg, a plateau lo o<ir progreas to a slavolass world. 

1 2 
I2Z. The derelopnaat of tha higher orgaolans nay ha regarded as due 
3 _ 4 

N. to to6athar,„p^''‘'‘‘A^r thapfc^^^''V4ialls to^Ca-"<J» 


183 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


FOLUDWING DIRECTIONS 


Tbia la b tast of your ability to roUo« diraotlona. 


You ar« clBBBlfy tbo amployaes of a depertnont aooordiij; to tba 
aalary aoch raoaiTaat acbodula of clBaalflontloDB la bd folloia 

Cleaa 1 ^lOOO to 1099, 

Claas 2; 1100 to 1199, 

Claaa Z 1200 to 1299, 

Claaa 4 1300 to 1399, 

Claaa 3* below #1000 or above ^iOOt 

The aalary for an account clerk la IllOD* 

The aalary for a tsail olark la |102b. 

Tba aalary for a atenograpber la ^1175* 

Tba aalary fbr a aacreury la 

k Junior etBployea In any of these poaltlona recolvoa #bD lesa than 
the aottUQt ehmn, #illa any Senior asployse recelvoe ilOO moie> For 
exosple, a Senior Stenographer receives U275 (#100 niorQ ihan a Stenographer) 
while a Jur^QT Stenographer receives #1125 (9^0 leao than a vStonographer)» 

iyter reading the directions you are to clesaify the poaitlona llatid 
below according to the aalery each rocolVAS, with those aieaptiona 

!• Positions not oentioned in the ebove direetl'^ns ere to he 
pieced in claaa numbar S* 

2« Individuals having five or ^re years* sTpe.iense are vj 
be placed in class number 5* 

3« Individuals with leas than 2 yeris* orporlecee are to •i>j 
plscad in clean nunber 5. 

Indicate your answer by naking a heavy nark oh t)^a ANSWUn i\\'^2 iB 
the space having the sene number as the salary eLeasifiobtLon. 

BmgLES 

1 Stanographer with 3 years of exporlence 
illl fall into Claaa 2. so a heavy laark Is nsde ^ ^ 

in space aiMbar 2 on your 4NSVEA SKECr, thus, I •' '' 


k Senior Stenograpbar wltb 7 yeara of , 

tzparienoa will fall into Claaa 5, so a ^avy mark 
]i Bade in apace nunher g on your iKSVER SHEET, tbua. 

i 4 1 

ll i| 1 


Queablon 

Humber 

Departmental 

Division 


PoeltloD 

Years 

Phtperlenoo 

Question 

^urber 

91. 

Accounting 


Account Clark 

4 

n; 

92. 

Pay-roll 


Stenographer 

3 

92 

93. 

Administrative 


Senior Stenographer 

5 

9d 

94. 

Filing 


Junior Seeretary 

2 

:•! 

95. 

Clerical 


Uall Clerk 


95 

9G. 

PlllnB 


Junior Aooount Clerk 

G 

9» 


^ UrI 


Stenogropher 

C 

97 


184 







CIVIL SERVICE TESTING 


T-F PAIRS IN FIVE-CHOICE FORM 

This part of the exaninatlon conticLa uf 50 quesiiona, each mnde up of 
two 8 tatemont 9 > laDcllca a and Bt You are to detennlno tho truth or 
falBity of ouch of the aiatementa. Huvliif; uono ao, you are to Indicate 
your answers on the ANSWER SHSCT as follows 

1. If you consider that the answer to either or both 
statiaments In any question cannot bo kno>m> malce a 
heavy mark in epaee number 1 on your AHbVQl SHI^ET. 

2 . If you consider both statements In any question to bo 
true, make a heavy mark In stwxce number 2 on your 
ANSWER 3HE3T. 

3. If you consider the first statement to be true and the 
eeeond statement to be falae . make a heavy mark In 
space number 2. on your AhonER oH2ET. 

4 . If you consider bath stateaiLnts bo be fplaa . make o 
heavy mark in space number ^ on your iiMJl.Zd SllSCT. 

54 If you consider tho first statement to be falsej and 
the second statement to be true , make a heavy mark in 
apace number ^ on your <iNSWISH i)H3ST. 


For your conveniaicSji these directions are 6 Umiaari 7 od 
below. 


Hark Ij if none of tho answsre below applies. 

Hark 2, if both atatwoonts are true . 

hark 3, if firat stabesiont is true , seconfl is faj^ae. 

Hark Ut if both statements are false 

Hark 5#. if first aUtomont io false, second is true . 


BUHPLS 

(a) All public agency omployoee arc happy. 

(B) Federal grants to States fol public assloLance will be drHSticully 
changed by 1950 . 

In the above question, tho first statenjont is f-ilse . nowavcr, 
nobody can know whether the second statement is true or false, 
Coneequenbly, the answer would be marked 1 as shown below. 


1234 


76 . (A) A person who doea not look you in the eye is likely to be diohonest. 

(a/ An Interview with an emotionally upset client should be poetponad 

until another day, 76 


77 (A) Public records and documents are an optional source for verification 

of eligibility of applieante for public aeBlsUnce by the visitor. 

(B) Infonaotion is not included in the index of a social service 

exchange as to the treatment given to a registered individual. 77 


75 (A) All property of a recipient of old ago assistance is considered part 

of the recipient's estate In the Probate Court. 

(B) The fg ^t th^ an aged pars ^ iq a reclpiont,iifa pension fron some 




185 






EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


TOPICAL FILING 


ThA tiy* elasBlfiaatlona In ft subjftot file ere ft* rollonsi 

1. Aaoouabln; (inolMdea Aoeotmte) 

2« AdralAlatrfttioii (inoludaa porsoimel) 

3. Uftiniftfitnoft (laoludea equipment and aupplLee) 

4* Salea 

Tranaportation 

Delorr le ft aerlea of bopleal eentenoea, namea of oataloga, ftnd the lilca 
whloh you afe to olaaolfy ftooordlnis to the above five dlTlalona. If the state- 
ment muld be moat logioally Claaalflftd under ftooounblng, make a heaTy mark on 
the ANSWI? SHEST in apaoe number 1) if it refera to administration, make the 
mark In epaoe number 2, and ao on. 


aCAMPLES 
A Qopy of the building 

Regulfttiona eovernlng the wrapping of paoksgei 
A oopy of oorporation laws 
InstruotloDS to aftlesmen 
Profit and loss atatenent 


Uake a mark la 
apaea niaaber 

S 
5 
Z 
4 
1 


121. Thera is no axpraaa offioa in Praaar« 

122 . 1 am intaraated in learning more about your product. 

123. I fear the ehlef aoooimt olerk will have to ba diaohar-ead* 
124« A liat^jCrAieflt atatlone. 


121 

122 

12S 

124 


PUNCTUATION 


The folIONitig saiflCUoD oontaUe errors le punetuetlon. Read each san* 
tenoe through first to get Its meaalng Then eorrect tbe errors by oroeeing 
cut aeedleas punctuation , changing inoorrect punctuoticn , and supplying 
onitted puDctuatloa Gonaider each of the following punctuation marks as one 
error 


period 

cocns 

hyphen 


aemlcolOQ quotation marlr 

colon pareDtheeia ) or I 

apostrophe 


When you hate made all naceoaary corrections io punctuatlop . count the 
Quaber of errcra wbich occur ln each line end make a heavy mark In the appro¬ 
priate apace on the AICWIR SHEET, thus 


Space number i If there la 1 error 

Space number 2 If there are Z or more errors 

Space Dumber 3 If tbeie are no errora 


Copyright laws have been lo effect in the U S for more than 151 
152. one hundred yaara, the flret etatute belog pasaed May 31, 1790, 152 
1^^* author or owner, of unpubllPhed matarlal baa a coBUnoo-la> 153 



CIVIL SERVICE TESTING 


fangled ideas as scoring machines Also, the State Technical 
Advisory Service of the Social Security Board was sufficiently 
Interested in the mechanization of the selection process to 
encourage experimentation in that direction. Thus it happened 
that the Employment Board of the Pennsylvania Department 
of Public Assistance, a merit system agency whose history dates 
6ack only to the end of 1937, decided to score its examinations 
by machine.® Since then, the number of agencies with scoring 
machine installations has been steadily increasing. 

Scoring Civil Service Tests zuith the I.B.M. Machine: Pro¬ 
cedures Enstiling Necessary Accuracy 

Insofar as the early reluctance to adopt machine scoring was 
based on skepticism concerning its accuracy, it was on firm 
ground, for when a civil service agency puts a score on a test 
paper, that score must be accurate. There is probably nothing 
more likely to undermine the prestige and public acceptance, 
if not the very existence under law, of a civil service commis¬ 
sion than the frequent, or even infrequent, discovery of errors 
in its work. To the public, the exact nature of the procedures 
used by a commission in scoring its papers are relatively unim¬ 
portant so long as they are honest and produce correct results. 

In the early days of scoring civil service tests by machine it 
was thought that the procedures which were satisfactory for 
scoring educational achievement and similar tests would be 
equally satisfactory for scoring civil service examinations, if 
certain additional precautions to spot errors were taken. Such 
procedures have, however, been abandoned in almost every 
instance In favor of a system designed specifically to meet civil 
service commission requirements of accuracy. The procedure 
now in use by the majority of agencies gives results that are 
probably as accurate as it is possible to obtain while human 
beings operate the machines and practical considerations ren¬ 
der it absurd to recheck other than borderline scores beyond 

3 Actually, the scoring was performed for the Employment Board by the 
Educational Records Bureau of New York City. 

187 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the point of finding more than one or two errors among thou¬ 
sands of scores. 

This scoring system is suitable when the number of items 
answered correctly constitutes the written raw score, and de¬ 
pends for its accuracy upon recognition of the fact, as stated in 
the LB M. manual, that “the only truly accurate method of 
scoring is the one which takes into account every mark on the 
answer sheet which is intended as an answer, making allow¬ 
ance for questions answered more than once, and eliminating 
from the final score all stray marks not intended as answers 
by the examinee” (11). The five steps to this procedure are 
as follows: 

1 Scanning papeis for omissions and foi items to which more than 
one answer has been indicated, and for the purpose of segregating sheets 
so poorly marked that they must be scored manually. A check mark is 
placed beside each omitted item and the number of items omitted is 
indicated in the box provided in the margin of the answer sheet. A 
horizontal line is drawn through each item answered more than once 
and the number of “surplus” answers indicated in the margin. (All 
marks on the answer sheet with the exception of those made by the 
examinee should, of course, be lecordcd with coloied pencil.) 

2. Scoiing for rights on the machine. 

3. Scoiing for wrongs on the machine. 

4. Totaling lights, wrongs, and omits (compensating for items 
answered more than once) and checking the total to see whether every 
Item has been accounted for. If the total checks at this point, no furthei 
operations are necessary except manually scoring every 25 th or 50th 
paper to provide a spot check of accuiacy. (The additional precaution 
may be taken of manually scoring the answer sheets of all examinees 
whose scores range from two points below to one point above the pass¬ 
ing point.) 

5 Adjusting papers on which the total does not check in Step 4. 
When this is necessary, the paper goes to an adjuster whose job it is to 
determine the reason for the discrepancy and to correct it. Answer 
sheets which require such adjustment are then checked by a second 
adjuster to ensure accuiacy. Answer sheets rejected in Step 1 as 
unsuitable for machine scoiing are scored manually and checked by these 
adjusters or by others especially designated to peiform this operation. 

By means of the commoning key now available to scoring 
machine users, it is possible to perform a very useful screen¬ 
ing operation in connection with Step 1, described above. This 
key may be inserted in the scoring rack between the sensing 

188 



CIVIL SERVICE TESTING 


and resistance units and the machine then adjusted so that the 
meter will indicate, for any given side of an answer sheet, the 
number of items attempted. For papers for which the machine 
adjusted in this way indicates “no omits,” the task of scanning 
may be reduced to looking for items with multiple markings 
and papers which it is desirable to score manually. 

A further refinement to this system may be wired into the 
machine so that the meter, instead of Indicating the number 
of items attempted, will read the number omitted. This is 
accomplished by adjusting the circuit so that sufficient current 
will flow through the meter initially to indicate the total num¬ 
ber of items in the test. Then, when an answer sheet is placed 
in the machine, the number of items attempted will auto¬ 
matically be subtracted from the initial reading, causing the 
meter to indicate the number of omissions. 

Before leaving the subject of scoring, a few words of cau¬ 
tion may be in order. The machine process in its present state 
of perfection is a big improvement over most other scoring 
techniques at present available for use by civil service jurisdic¬ 
tions. It has not, however, reached the state of refinement 
where it can be taken completely for granted that nothing will 
go wrong after the machine has been set up. Since every once 
in a while something does go wrong, it is necessary, to avoid 
later grief, not only to set up checks and controls but to re¬ 
quire strict adherence to them on the part of all staff members 
charged with any scoring responsibility. 

Other Machine Methods of Scoring 

The LB.M. scoring machine seems to offer, for ordinary 
civil service use, what appears to be the best all-around solu¬ 
tion to many of the most annoying scoring problems confront¬ 
ing the medium or large size civil service agency. In addition, 
as will be noted later, this particular machine may be adapted 
for use in connection with several other examining tasks, in¬ 
cluding certain research projects that every civil service agency 
has in mind carrying through just as soon as the staff and the 
time are available. 


189 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


There are, however, at least two additional scoring ap- 
proaches classifiable under the head of “mechanical” that have 
been used to some extent by agencies conducting large num- 
bers of examinations. 

The first of these makes only partial use of a mechanical 
device—In this case a multilith, mimeograph or printing press 
—and is more accurately a technique for facilitating manual 
scoring than a procedure for scoring by machine. Separate 
answer sheets are used on which the examinee indicates his 
answers to multiple-choice or true-false items by checking ap¬ 
propriately numbered spaces. A multilith or mimeograph sten¬ 
cil is then prepared so that, when the answer sheets are run 
through the duplicating machine, a line connecting the correct 
answers will be printed over the response positions. When this 
has been done, It is possible for a scoring clerk to determine 
the number of correct answers simply by counting the number 
of responses marked in positions coinciding with the over¬ 
printed line. 

This combination machine-manual method is considerably 
more rapid and accurate than manual scoring accomplished by 
placing a key alongside or over the answer sheet By altering 
the procedure slightly it may also be used to advantage with 
completion-type questions. It requires, however, a skilled du¬ 
plicating machine operator and the use of a duplicating ma¬ 
chine capable of very accurate registration and extremely little 
spoilage. The multilith satisfies these requirements particu¬ 
larly well, and printing may, of course, be employed. On the 
other hand, the mimeograph appears to be less satisfactory. 

A procedure, described by Iffert, Bloom, and Beura (6), 
for scoring multiple-choice tests by means of tabulating ma¬ 
chines has apparently also been used with some success, al¬ 
though its chief value would appear to be in connection with 
the conduct of examining programs that are quite intimately 
tied in with lesearch projects When this particular method is 
employed, the examinee’s answers are usually placed directly 
in the test booklet from which they are later punched into 
Hollerith cards and scored by successive runs through a sorter. 

190 



CIVIL SERVICE TESTING 


Once these cards are punched they are also available for re¬ 
search purposes and it is comparatively easy to conduct item 
analyses and compute correlations with them. 

Scoring Graphic Rating Scales by Machine 

Graphic rating scales may be, and have been, scored by the 
I.B.M. scoring machine. “By using the aggregate weighting 
unit of the machine it is possible to obtain the aggregate 
weighted average of as many as 30 variables, each varying in 
size from 1 to 100 and (in groups of three) weighted from 0 
to 20” (10). 

In utilizing this feature of the machine the rating scale Is 
usually designed so that it is necessary to draw a horizontal 
line for each characteristic rated The length of each such line 
determines the score for that particular characteristic, and the 
weighted total for all characteristics is indicated on the meter 
in the same fashion as any other score. The horizontal lines 
should, of course, be drawn with a special pencil, and may be 
made by the rater himself or be drawn in later by a cleric.^ 
When the latter plan Is used, the rater checks (with a colored 
pencil) the point on each line that represents the rating he 
wishes to assign and the clei'k simply draws a line from the 
origin (left) to each check mark. Graphic scales of this type 
may be used in connection with oral interviews, service ratings, 
or performance test ratings. 

Scoring Training and Experience by Machine 

Many civil service commissions include a quantitative rat¬ 
ing of training and experience in the test battery for a majority 
of the classes of positions for which they conduct examinations. 
It now appears quite likely that a considerable portion of the 
computational work connected with the use of the type of 
training and experience rating scale (14) employed, in one 

^In several informal studies conducted by an agency which formerly used 
this type of rating sheet' in large quantities it was found that where raters made 
check marks only, it was apparently faster to score scales of this kind manually 
(by having a clerk place a stencrl over the rating sheet and add the numbers 
on a comptometer) than to go to the trouble of drawing a line for each char- 
acterrstic before running the sheets through the scoring machine. 

191 



EDUCATIONAL AND PSYCHOLOGICAL MEASUSEMENT 


form or another, by numerous agencies throughout the coun¬ 
try may be performed by machine. 

This possibility -was brought closer to realization recently 
with the development” of a tentative form of machine-scorable 
training and experience rating scale which, while it still re¬ 
mains to be tried in an actual test situation, looks very much 
as though it will not only work but possibly be at least a par¬ 
tial or preliminary answer to the mechanization of this phase 
of the selection process. 

This adaptation of the machine makes use of the principle 
of the previously mentioned commoning key. In marking the 
scale the rater simply blackens each space that corresponds to 
a type of training or experience possessed by the examinee 
whose application is under consideration. As an example of 
the possibilities of this approach, provision has been made in 
the initially developed form of the scale for recognition of up 
to IS years—in six-month steps—of each of foui levels of re¬ 
lated experience, three levels of related under-graduate and 
graduate study, and the possession of academic degrees. 

Scoring Service Ratings by Machine 

Present service rating instruments take various forms, 
many of which may be scored by machine. Before deciding to 
adopt such a procedure, however, the numerous factors in¬ 
volved should be given careful consideration, and the decision 
based upon the extent to which machine scoring will contribute 
to the economy, speed, and all-around efficiency with which 
this particular phase of the program may be administered 

Service rating scales of the graphic type may be scored by 
setting them up to utilize the aggregate weighting feature of 
the machine, In addition various adaptations of the graphic 
approach may be employed which, depending upon the par¬ 
ticular situation at hand, may be scored by using either the 
ordinary answer key form alone or in conjunction with the 
more recently available commoning key, with the latter set-up 
offering considerably the greater possibilities. Service rating 

® This machine-scorable scale was the outgiowth of a discussion partici¬ 
pated in by E. C. Schroedel, G. C* Sloughier, J, H. Pockrass, and S. W. Koran, 


1 no 



CIVIL SERVICE TESTING 


forms of the check-llsf variety may also rather easily be 
adapted to scoring by machine. 

Item Analysis With the Scaring Machine 

A recently developed attachment to the scoring machine is 
the graphic item counter, which is available as optional equip¬ 
ment. This device consists of a plugboard having a plugging 
position for each of the 750 response positions on the standard 
answer sheet and for each of 90 counters. By means of plug- 
wires, any response position may be connected to any counter. 
When the appropriate response positions and counters have 
been wired together, the plugboard is inserted into the ma¬ 
chine in the position normally occupied by the scoring rack. 

Using this attachment it is possible to secure, in a single 
run through the machine, a graphic count of the marks placed 
in up to 90 response positions on 100 answer sheets. If more 
than 100 sheets are involved in the study, a separate graphic 
count must be made for each group of 100 sheets. If more 
than 90 response positions are to be analyzed in a given test, 
the plugboard must be i-ewired and the sheets run through 
again for the additional responses. Thus, if in a given analy¬ 
sis, it is desired to determine the number of individuals who 
correctly answered each of the 150 live-choice items on a single 
side of an answer sheet and the population of the study is 175, 
it is necessary to run the 175 sheets through the machine twice, 
making separate graphic counts of the first 100 and the last 75 
on each of the two runs. Items 1 to 90, inclusive, may be ana¬ 
lyzed on the first run, and the remaining 60 items (91 to 150, 
inclusive) on the second run. 

If the item analysis is of the variety that requires informa¬ 
tion concerning the examinees’ selection of each of the five 
possible responses to the 150 items, the sheets will have to be 
run through the machine nine times to obtain this information 
for each of the 750 possible responses. Whether the items will 
be analyzed to the extent of determining the number of exam¬ 
inees selecting each possible response or be confined to deter¬ 
mining the number of examinees selecting the correct answer 


193 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

will, of course, depend upon the use or uses for which the 
data are intended. The speed of operation of the machine 
equipped with the item analysis unit has been reported as 
ranging from 400 to SOO papers per hour when 90 responses 
are analyzed on each paper. This is considerably faster than 
any clerk can perform the job manually. 

Computing Reliability Coefficients, Standard Error of Meas¬ 
urement and Intercom-elations with the Scoring Machine 

During the past few years several techniques have been 
developed for using the test scoring machine to facilitate the 
computation of such useful measures as intercorrelations, re¬ 
liability coefficients, and the standard error of measurement. 
While it is beyond the scope of this presentation to go into the 
derivation of the formulas that have been developed oi to 
describe the procedures at any length, mention will be made 
of a few of the more important of these applications in order 
to illustrate the variety of the scoring machine’s uses in con¬ 
nection with examination research. 

One kind of research, that of investigation into the validity 
of individual items by means of the technique of item analysis, 
has already been mentioned. Hoyt (5) recently described a 
method of computing test reliability which makes further use 
of some of the data obtained when such an item analysis is 
performed. The procedure which Hoyt suggests was devel¬ 
oped as a practical and simplified application of Richardson 
and Kuder’s (9, 15) “method of rational equivalence," which 
produces a coefficient of reliability that in certain respects ap¬ 
pears to be superior to that obtained by using the split-half 
correlation method with the Spearman-Brown formula. No 
data beyond those obtained when a test is scored and an item- 
analysis performed are required for substitution into the fol¬ 
lowing formula (5) : 

„ kSs-\-Si—T(T + k) 

~~ n - 1 kSs — r 

in which r^^ is the reliability of the test, n is the number of 
items in the test, k is the number of subjects taking the test, 

194 



CIVIL SERVICE TESTING 


T is the sum of the scores obtained by all the subjects, Ss is the 
sum of the squares of the scores obtained by the subjects, and 
Si is the sum of the squares of the number of correct responses 
to each item. 

Although the Kuder-Richardson technique has numerous 
advantages which make its wider adoption quite likely, many 
investigators in the field of civil service examinations may have 
to continue to obtain most of their reliability coefficients by 
means of the split-half method used in conjunction with the 
Spearman-Brown formula. This, at any rate, will probably con¬ 
tinue to be the situation unless item analysis data are available 
for substitution into a formula such as the one above or the 
simpler formula presented by Kuder and Richardson" is not 
appropriate In the specific situation. 

When it Is known at the outset of scoring a given test that 
a split-half reliability coefficient will be required, it is possible 
to prepare the scoring matrices so that separate scores for the 
odd-numbered and for the even-numbered items will be ob¬ 
tained when the answer sheets are run through the machine for 
the first time. This may be accomplished by keying the odd- 
numbered items in the usual fashion (that is, as rights), and 
the even-numbered Items as wrongs (that is, preparing the 
scoring matrix so that even-numbered items answered correctly 
will be indicated when the selector switch Is in the wrongs 
position). To secure the total rights score when the machine 
is set up in this way, the selector switch need only be moved 
to the R -f- W position, When the selector switch is moved to 
the R position the meter will indicate the number of odd- 
numbered items answered correctly, and when it is In the W 
position, the number of even-numbered items answered cor¬ 
rectly. 

Far from being extra work, this procedure possesses the 

“It should be noted that Kuder and Richardson (9) have deiived a sim¬ 
pler foimula (No. 21 in the article refcircd to) which can be computed in two 
or three minutes, given the number of items in the test, the average score, and 
the standard deviation of the scores. It gives a slight undeiestimate of the true 
reliability of a test. For the more reliable tests the estimates obtained by this 
formula are usually from ,01 to .03 less than those obtained by use of the 
more rigorous formulas presented. 


195 



educational and psychological measurement 

advantage of providing a useful check on the total score. As 
only half of the sets of contacts are connected to the meter 
when the switch is in either the R position or the W position, 
the effect of stray marks on the reading is frequently mini¬ 
mized to the point where, on some papers, the total of the 
separate R and W readings (added without the use of the 
scoring machine) may provide a more accurate score than the 
R + W (total rights) reading taken alone. The reason for 
this is, of course, that while the effect of stray marks and poor 
erasures may be sufficient to influence the score when 150 sets 
of contacts are in the circuit, their effect may not be noticeable 
when only 75 sets of contacts are involved at a time.’ 

It might be well, at this point, to call attention to the fact 
that when the "rights plus wrongs plus omits” scoring pro¬ 
cedure described earlier in this paper is employed with a single 
machine set up to read rights and wrongs on a single insertion, 
it is not possible to secure the split-half scores at the same 
time in the fashion just suggested. This problem may, how¬ 
ever, be solved by reading the rights on one run through the 
machine and the wrongs on a subsequent run. 

Several formulas are available for determining split-half 
reliability. In place of the orthodox product-moment formula, 
some Investigators prefer, under certain circumstances, a 
formula which requires the substitution of values obtained 
from the odd scores and total right scores only. As described 
by Mosier (13) this formula takes the form: 

Vof® + (t/ — 


I The problem presented by the necessity for manually scoring considerable 
numbers of answer sheets because the effect of stray marks causes the total of 
rights, wrongs, and omits to differ from the total number of items is worthy of 
attention The kind of solution suggested above in connection with the split-half 
scoring of rights may, if necessary, be adopted as a regular practice in scoring 
wrongs as well as rights. Because of the considerably larger answer sheet area 
exposed to "live” contacts when the wiongs score is being determined, such read¬ 
ings are usually affected by stray maiks and pool eiasures to a greater degree 
that rights scores. When split-half scores are not required, the answer sheet 
area may, of course, be divided into two or three sections by punching appro¬ 
priate field selection holes so that a sepaiate leading may be obtained for 
each area. 


196 



CIVIL SERVICE TESTING 


Still another approach—and one which has been gaining 
considerable favor recently with scoring machine users — 
makes use of the fact that once the machine has been set up 
to indicate odds and evens, the difference between these values 
may be obtained by simply turning the selector switch to the 
R-W position. The standard deviation of the difference scores 
thus obtained for a given test will, as has been pointed out by 
Rulon (16), equal the standard error of measurement of the 
scores in that group. A split-half reliability coefficient may 
then be obtained by substituting in the following simple 
formula (2) ; 



in which is the reliability of the test, 0 ^ is the standard de¬ 
viation of the difference between the odd and even scores, and 
IS the standard deviation of the ordinary test scores includ¬ 
ing both odds and evens. 

A method making it possible to use the scoring machine 
for calculating tables of intercorrelations has been developed 
by Kuder (8) but has apparently not been very widely used, 
perhaps because of the laborious clerical work involved in 
preparing the coded answer sheets required for the computa¬ 
tions. The woik thus involved has lately been reduced, how¬ 
ever, in a revised and simplified procedure suitable for studies 
Involving up to 150 cases. The Kuder technique is an adapta¬ 
tion of the Royer-Toops method of obtaining correlations 
from Hollerith cards on which geometric codes of scores for 
each variable have been punched. Kuder’s approach has been 
to use an answer sheet and stencil for each code, but he does 
not recommend substituting the scoring machine for Hollerith 
equipment when the latter is easily available (8). 

Kuder has also pointed out that the scoring machine is 
excellently suited for obtaining tetrachoric coefficients of cor¬ 
relation and that the procedure for doing so is relatively sim¬ 
ple. Since in tetrachoric correlation each variable is divided 
into a dichotomy instead of into class intervals, the amount of 
“coding" and clerical work involved is reduced to a minimum. 

197 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Tabulating Equipment 

It has been possible to note, during the past few years, a 
considerable increase in the number of civil service agencies 
that have adopted mechanical procedures for handling. In addi¬ 
tion to scoring, such other operations as assigning candidates 
to the written, oral, and peiformance tests; computing grades; 
notifying candidates of their eligibility and ineligibility; estab¬ 
lishing registers of eligibles and lists of candidates who did not 
qualify; certifying names from registers; maintaining miscel¬ 
laneous personnel and payroll records; and aiding in the con¬ 
duct of research (3, 4). Let us examine briefly some of the 
ways in which Hollerith equipment may be used to facilitate 
the conduct of some of the operations connected with the proc¬ 
essing of examinations, keeping in mind, however, the unlikeli¬ 
hood that a machine installation would prove particularly eco¬ 
nomical for conducting these specific operations to the exclusion 
of the numerous related tasks just enumerated.” 

One of the more Important jobs which it is possible to per¬ 
form successfully with Hollerith cards is that of converting, 
weighting, and combining examiners’ scores on the various 
components of the examination, transmuting the results into 
final grades, adding veterans’ credit and other bonuses, and 
listing, in order of final grade, the names of those who have 
qualified 

The tabulating card used for this purpose includes fields 
which provide columns into which may be punched such data as 
the following, depending upon the needs of the given situation; 
identification and file numbers; class of position; written, train¬ 
ing-experience, oral, performance, and service rating raw 
and converted (or weighted) scores; total converted score, 
final grade, veteran’s credit, rank, el cetera. Written raw 
scores and identifying data are punched into what are called 
“detail” cards with the electric key punch and are verified by 
means of the mechanical verifier These cards are then ar- 

®An exception to this might be the situation in which certain equipment 
of another agency is available foi part-time use so that the only additional 
machines required are a key punch and verifier, and possibly a sortei 

198 



CIVIL SERVICE TEST 


ranged in order of identification number by means of the 
horizontal sorter, Raw scores on each subsequent part of the 
examination battery are usually first punched into “scratch" 
cards which, after being verified, arc sorted according to 
identification number in the same fashion as the detail cards. 
When this has been done, the data on the scratch cards are 
transferred to appropriate columns of the detail cards through 
use of the automatic reproducing punch (3). 

After the raw scores have been punched into the detail 
cards, it is usually necessary to convert them into whatever 
variety of transmuted score the agency uses for the purpose 
of assigning the announced weight to each component of the 
examination. The raw score data which ordinarily serve as 
the basis for computing the conversion tables are easily se¬ 
cured from the cards by sorting them by raw score and run¬ 
ning them through a numeric tabulator. 

Several methods of transferring the transmuted scores to 
the detail cards may be used: Conversion tables may be pre¬ 
pared for use by key punch operators who determine the con¬ 
verted score corresponding to each raw score and punch that 
figure into the detail caid (4). A second method that may 
be employed calls for using the automatic multiplying punch 
for the purpose of multiplying the raw score by some constant 
(for example, the reciprocal of the number of questions in 
the written test, if a percentage is desired). A third method 
makes use of prepunched master cards each of which contains 
a possible raw score and its corresponding conversion. When 
using this arrangement, both the master caids and the detail 
cards are sorted by raw score and run through an automatic 
reproducing punch which transfers the converted scores from 
the master cards to the detail cards at a high rate of speed (3). 
When the transmuted scores for all components of the exam¬ 
ination have been entered, they may be totalled and the sum 
punched into the appropriate columns of the detail card. If 
this total requires further conversion, the process involved is 
identical to that of transmuting individual raw scores and may 
be carried out in any one of the three ways mentioned. 

199 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


In addition to serving the purposes for which they were 
designed, the punched cards used in arriving at the examinee’s 
final grade and register position are also available for numer¬ 
ous research uses. Agencies to which the use of a Hollerith 
installation is available are in the fortunate position of being 
able to perform many of the research jobs discussed as pos¬ 
sible with the scoring machine and, in addition, to make use 
of the amazing flexibility of the punched card method to con¬ 
duct types of research which, because of the amount of clerical 
and statistical work sometimes involved, are all but imprac¬ 
ticable when attempted without such aid. 

REFERENCES 

1. Dunlap, Jack W. "Problems Arising from the Use of a Separate Answer 
Sheet," Journal of Psychology, X (1940), 3-48 

2. Flanagan, John. C. “Note on Calculating the Standaicl Error of Measure¬ 
ment and Reliability Coefficients with the Test Scoring Machine,” Journal 
of Applied Psychology, XXIII (1939), 529 

3 Flawthornc, Joseph W and Morse, Muriel, Business Machines in Public 
Personnel Adminisltation, Los Angeles City Civil Service Commission, 1940, 
« PP- 

4. Horohow, Reuben Machines in Ciwl Service Reciuitment Chicago, Civil 
Service Assembly of the U. S. and Canada, Pamphlet No 14, 1939, 43 pp, 

5, Hoyt, C J “Note on a Simplified Method of Computing Test Reliability," 
Edncaitonal and Psychological Measurement, I (1941), 93-95 

6 Iffert, R. E., Bloom, B, S., and Beum, C. 0. Another Test-Scoring Proced¬ 
ure A Mdhod of Scoring Short Tests on the Hollerith Sorter. Columbus; 
Ohio College Association Bulletin, No. 118, Mimeographed, February 1940, 
7 pp. . 

7. Koran, Sidney W “Adapting Tests to Machine Scoring,” Journal of Ap¬ 
plied Psychology, XXIII (1939), 709-719 

8. Kuder, G Frederic. “Use of the International Scoring Machine for the 
Rapid Calculation of Tables of Inteicorrelations,” Journal of Applied Psy- 
cology, XXII (1938), 587-596. 

9. Kuder, G. F, and Richardson, M. W “The Theory of the Estimation of 
Teat Reliability,” Psychomelrika, II (1937), 151-160 

10 Machine Method of Scoring and Analysing Examinations. New York; 
International Business Machines Corporation, undated, 14 pp 

11. Machine Methods of Test Scoring' Manual of Procedures. New Yoik. 
International Business Machines Corporation, 1940, 7 pp. 

12 Manual of Instruction for the International Test Scoring Machine. New 
York' International Business Machines Corporation, 1939, 20 pp. 

13. Mosier, Charles 1. “A Short Cut in the Estimation of Split-Half Coeffi¬ 
cients," Educational and Psychological Measurement, I (1941), 407-408. 

14 Pockrasa, Jack H "Rating Training and Experience in Meat System Selec¬ 
tion,” Public Personnel Review, II (1941), 211-222 

15. Richardson, M. W., and Kuder, G, F. “The Calculation of Test Reliability 
Coefficients Based on the Method of Rational Equivalence,” Journal oj 
Educational Psychology, XL (1939), 681-687. 

16 Rulon, Phillip J "A Simplified Procedure for Determining the Reliability 
of a Test by Split Halves,” Harvard Educational Review, IX (1939), 


200 



PREDICTIVE VALUE OF CERTAIN 
“LAW APTITUDE” TESTS^ 


E. L. WELKER and T. W HARRELL 
University of Illinois 


T his paper reports the second of a series of studies 
analyzing the abilities necessary for success In law school. 
An earlier study® showed that pre-law grades from one school, 
the University of Illinois, correlated higher with law grades 
than did the Ferson-Stoddard Laia Aptitude Examination. 
Combining the test and pre-law grades did not significantly 
improve the prediction. The homogeneous parts of the law 
aptitude test were correlated separately with law grades and 
showed that the memory case questions gave an unquestion¬ 
ably insignificant correlation This result is interesting since 
the memory material seems to represent a popular stereotype 
of what a law student has to,do. 

Several other investigators have reported comparisons be¬ 
tween test scores and law-school grades, but apparently no 
one has previously reported a detailed attempt to analyze 
the relation between separate law course grades and part 
scores of “law aptitude tests.” The ultimate aim of these 
studies is of course to discover tests that will lead to the more 
valid prediction of law school success. 

The variables included in this study are listed in Table 1. 
It will be noted that the tests used were the homogeneous 
parts of the Ferson-Stoddard Low Aptitude Examination, in 
addition to the homogenous parts of other selected tests—the 
Yale Lepal Aptitude Test, the American Council on Educa- 

^This study was made possible through the generous cooperation of Dean 
Albert J, Harno, University of Illinois College of Law, 

^T. W Harrell, “Predicting Success of Law School Students," American 
La<w School Review. IX (1939), 290-202, 

201 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tion Psychological Exaininalion, and the comprehension and 
speed tests of the Minnesota Reading Examination. Some of 
the pre-law grades (variable 26) were appioximated where 
students attended a school other than Illinois, Average law 
grades for the first semester as well as course grades for the 
five first-semester courses were used as criteria. 

TABLE 1. NAMES AND DESCRIPTIONS OF VARIABLES 
Vaiiable 

Number Description Name 

1 Interpretive case Ferson-Stoddard Law Aptitude Exam Part 2-A 

2 Completion Feraon-Stoddaid Law Aptitude Exam, Part 2-B 

3 Relevant facta Feraon-Stoddard Law Aptitude Exam, Part 2 -C 

4- Logical inferences Feraon-Stoddard Law Aptitude Exam Part 3 

5 Matching Fcraon-Stoddaid Law Aptitude Exam. Part 4 

6 Memory case Feraon-Stoddard Law Aptitude Exam Part 1-B 

7 Authmetic case ACE Psychological Examination (1938) 

8 Pattern analogies ACE Psychological Examination (1938) 

10 Completion ACE Psychological Examination (1938) 

11 Artificial language ACE Psychological Examination (1938) 

11 Artificial language ACE Psychological Examination (l938) 

12 Same-opposite ACE Psychological Examination (1938) 

13 Reading speed Minnesota Reading Examinations 

14 Reading comprehension Minnesota Reading Examinations 

19 Word relations Yale Legal Aptitude Test Group I 

20 Opposites Yale Legal Aptitude Test Group II 

21 Word analogies Yale Legal Aptitude Test Group III 

22 Logical inferences Yale Legal Aptitude Test Group IV 

23 Memory case Yale Legal Aptitude Test Group V 

24 Interpretive case Yale Legal Aptitude Test Group VI 

25 Definitions Yale Legal Aptitude Test Group VII 

26 Pie-Law Grades, including those approximated from other schools 

27 Average First-Semestei Law Grades, Univ, of Illinois College of Law 

28 Course Grades in Contracts First Semester, Univ. of Illinois Col. of Law 

29 Course Glades in Torts First Semester, Umv of Illinois Col. of Law 

30 Course Grades in Remedies First Semester, Univ. of Illinois Col. of Law 

31 Course Grades in Criminal Law First Semester, Univ. of Illinois Col of Law 

32 Course Grades in Possessory 

Estates First Semester, Univ. of Illinois Col. of Law 

The subjects were 133 male Law College freshmen at the 
University of Illinois. Seventy-eight of these entered in the 
fall of 1938 and 55 in the fall of 1939. The means of the 
two groups on both test scores and grades appeared similar 
enough to justify combining the data, for the two years into 
one study. 

The product moment coefficients of correlation between 
each of 21 test scores and average first-semester law grades 
are shown in Table 2. Insignificant correlations, i.e., those 

202 



“law aptitude" tests 


less than .17, foi' which the chances that such a coefficient 
of correlation will occur in an uncorrelated population are 
more than 5 in 100, are omitted. Barely significant coef¬ 
ficients, i.e., those between .17 and .22, where the chances are 
more than 1 in 100 dhat such a coefficient will occur in an 
uncorrelated population, arc in parentheses. No correction 
has been made for attenuation or the unreliability of the 
variables. 

TABLE 2 

product moment coefficients of oorrecation with first-semester eaw 


Variable 

GRADES N — 133 

Coirelation with First 

Numbei 

Semestei Law Grades 

1 

— 

2 

— 

3 

(.17) 

4 

28 

5 

.31 

6 

— 

7 

.23 

8 

(.17) 

9 

.24 

10 

28 

11 

.31 

12 

— 

13 

— 

14 

,25 

19 

.25 

20 

,39 

21 

.33 

22 

30 

23 

(.19) 

24 

— 

25 

— 

26 

49 


Note; Insignificant coefficients are omitted and barely significant ones aie 
parenthesized. 

In evaluating some of the parts of the Yale Legal Apti¬ 
tude Teit should be noted that the scores reported do not 
represent separate sections with individual time limits. The 
test is made up of three parts which are separately timed 
The parts are not homogenous as to ^ the type of item used. 
The first and third parts are composed of four item types: 
Word Relation, Word Opposites, Word Analogies, and Log¬ 
ical Inferences. ^ These are arranged in cycle-omnibus form 
with 10 items of the same type together. The total number 

203 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of items of each type is 40, The second part is made up of 
three additional kinds of questions. First are 20 memory 
items dealing with a case that was presented at the beginning 
of the test—before Part 1 Next are 40 items of the Inter¬ 
pretive Case variety. Finally there are 20 Definition ques¬ 
tions. 

Mr. W. E. Kline of the Yale Personnel Bureau writes that 
the Interpretive Case _ and Definition items have consistently 
yielded only low correlations with grades. This result is ex¬ 
plained by the fact that these types of questions do not appear 
early enough in a timed section for reliable scores to result. 
Consequently a new form of the Yale test is being put to¬ 
gether. “It contains seven sub-tests, each of which is homo¬ 
geneous and individually timed.” 

Tests correlating clearly significantly with law grades, as 
shown in Table 2, are, in order of their coefficients from high 
to low; Yale Opposites, Yale Word Analogies, Ferson- 
Stoddard Matching, ACE Artificial Language, Yale Logical 
Inferences, Ferson-Stoddard Logical Inferences, ACE Com¬ 
pletion, Minnesota Paragraph Reading Comprehension, Yale 
Word Relations, ACE Number Series, ACE Arithmetic Tests 
adjacent in order seldom if ever have coefficients that aie 
significantly different. 

It is recognized that a completely thorough understanding 
of the interrelations of the variables calls for a factor analysis. 
Such a study is planned. All intercorrelations have been 
computed. 

Tests which correlated barely significantly with law grades, 
as shown in Table 2, are: Yale Memory,Case, Ferson-Stod¬ 
dard Relevant Facts, and ACE Pattern Analogies. 

The following tests did not correlate significantly with the 
first-semester mean; Ferson-Stoddard Interpretive Case, Fer¬ 
son-Stoddard Analogous Case, Ferson-Stoddard Memory 
Case, ACE Same-Opposite, Minnesota Reading Speed, Yale 
Interpretive Case, and Yale Definitions. 

None of the correlations is as high as .40. The Yale test 
correlates slightly higher than any other test total. Some of 

204 



“law aptitude" tests 

the Ameiican Council sub-tests and the Minnesota Reading 
Comprehension correlate significantly, while some of the so- 
called law aptitude sub-tests do not. 

It was mentioned above that the previous study showed 
that the memory case questions in the Fcrson-Stoddard test 
correlated insignificantly with law grades. This result is con¬ 
firmed here, but the Memory Case in the Yale examination 
does give a,barely significant correlation. Mr. Kline writes 
that the memory questions correlated .33 with first-year grades 
of the Yale Law freshmen of 1940 

Pre-law grades correlated .49 with first-semester ,law 
grades. This coefficient is higher than any with test scores, 
but considerably lower than that reported in the previous 
paper. One explanation, for the lower coefficient is the less¬ 
ened accuracy of the present pre-law grades. These include 
grades at schools other than Illinois plus those at Illinois. Pre¬ 
viously only Illinois pre-law grades ,were included. Where 
grades from different schools are combined, it seems unlikely 
that the result will be as reliable a test of values as those 
from,one school, due to differences in grading systems. An¬ 
other reason for the decreased correlation between law grades 
and pre-law grades Is that the present group is more homo¬ 
geneous for pre-law grades. This situation was ^occasioned by 
raising the requirement for entrance for Illinois students hav¬ 
ing only 3 years’ credits from a grade-point average of 3.0 
to,3.2S. 

The product moment coefficients of correlation between 
each of five law grades and the 21 test variables are shown 
in Table 3. Again, insignificant coefficients have been,omitted, 
and barely significant ones parenthesized as in Table 2. 
Variable 30, Remedies, correlated significantly with 12 test 
scores; variable 28, Contracts, with nine; variable 29, Torts, 
with nine; variable 32, Possessory Estates, with four; and 
variable 31, Criminal Law, with only two. Legal aptitude 
tests measure more nearly what is required,to master Reme¬ 
dies, Torts, and Contracts, than they measure what is re¬ 
quired to understand Criminal Law and Possessory Estates. 

205 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

The significance of .these differences has not been tested. Part 
of the differences could be otherwise explained if the reliability 
of the grades varied markedly from one course to another, but 
their reliabilities are unknown. Some thought has been given 
to estimating the reliabilities using the Kuder-Richardson 
method, since it does not demand split-half scores. This has 
not been done because the number of .items represented by the 
law grades is nonexistent. Some conversation with lawyers 
suggests that Remedies does demand more reasoning than 
does Criminal Law, which requires greater memorization. 


TABLE 3. 


PRODUCT MOMENT 

COEFFICIENTS OF CORRELATION BETWEEN 
GRADES AND 21 TEST SCORES 

EACH OF 5 

LAW 

Vaiiable Number 

28 

29 

30 

31 

32 

1 

— 

— 

— 

— 

— 

2 

— 

— 

— 

— 

—. 

3 

— 

(.17) 

( 20) 

— 

( 19) 

4 

.32 

.24 

.31 

— 

(.22) 

5 

28 

.31 

.34 

( 19) 

.25 

6 

— 

— 

— 

— 


7 

.23 

(.19) 

.25 

— 

.24 

8 

— 

(22) 

.23 

— 

— 

9 

(22) 

(.19) 

.30 

— 

(.15) 

10 

.26 

.29 

,30 

.26 

(17) 

11 

.37 

.26 

41 

(20) 

(.22) 

12 

— 

— 

(.18) 

— 

— 

13 

— 

— 

— 

— 

— 

14 

(.21) 

.24 

26 

— 

— 

19 

(.21) 

.24 

,35 

— 

— 

20 

.39 

.35 

.47 

26 

.30 

21 

.29 

32 

44 

(.22) 

.24- 

22 

,25 

.27 

.39 

(.22) 

— 

23 

25 

(.22) 

— 


(19) 

24 

— 

— 

(.21) 

— 

— 

25 

— 

— 

(.22) 

— 

— 


Note: Insignificant coefficients are omitted and baiely significant ones are 
parenthesized. 

It will be noted that three of the correlations with Rem¬ 
edies are.higher than any of those with the semester means. 
The differences are scarcely reliable 

It can be tentatively concluded that while no legal apti¬ 
tude test correlated as high with .law grades as do pre-law 
grades, the most predictive tests are those that call for rea¬ 
soning rather than memory. The reasoning tests may use 


206 



“law aptitude” tests 


words or .numbers for symbols, but there seems to be an 
advantage for the former, as might be expected. 

Each of the two legal aptitude tests correlates higher 
with pre-law grades than with law grades, . This difference 
might be explained if the pre-law grades are more reliable 
that! the law grades. The authors have not been able .to de¬ 
termine the reliability of either The law grades might be 
expected to be more reliable from the fact that the law course 
is more homogeneous than is the varied pre-legal curriculum. 
On the other hand, grades based on 6 to 8 semesters.of pre¬ 
law work, because of the increased reliability with additional 
length, would be expected to be more .reliable than grades 
from a single semester of law 

Since the two so-called “legal aptitude” tests correlate 
lower with .law grades than with other college grades and 
since several tests that are not called “legal aptitude” cor¬ 
relate higher with law grades than several that are putative 
measures of law-school success, the question , is raised as to 
the possible existence of a factor or factors of legal aptitude. 
The factor analysis of these data may contribute a clearer 
answer.to the question. 


207 




AN EXPLORATORY STUDY OF SOCIAL GUIDANCE 
AT THE COLLEGE LEVEU 


MARGARET GLOCKLER ALDRICH 
University of Minnesota 


W ITHIN THE LAST 10 years there has been an In¬ 
creasing emphasis on guidance at the college level. This 
movement is important, but it is significant that the personnel 
workers in institutions of higher education have been con¬ 
cerned almost exclusively with educational and vocational 
problems. In some cases, however, college authorities have 
come to realize that there are certain social problems and 
adjustments which should be considered. Many college pro¬ 
grams fail to provide social stimulation and opportunity for 
participation. Extra-curricular activities have developed on 
college campuses to fill this need. Most of these activities 
have been developed on the basis of student initiative in spite 
of, rather than because of, faculty approval. 

An interest in the development of social adjustment in 
colleges led to the following experimental evaluation of social 
guidance at the college level. 

Naturally, the valuation of guidance has lagged far behind 
the development of guidance techniques. There have been 
several attempts to determine the effects of diagnosis and 
tieatment of educational and vocational problems [Beaumont 

^This study was undeitaken at the suggestion gf Piofessor D. G. Paterson 
of the University of Minnesota. His advice and interest made its completion 
possible. Dean E. G, Williamson, Dr John Darlcy, and the counselors of the 
University of Minnesota Testing Bureau, as well as vanous extra-curricular 
organizations on that campus, made possible the execution of the problem. 

209 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


(1), Wrenn (10), and Williamson and Bordin (8)]. These 
studies indicate the possibilities in this field. However, there 
has been no such study of social guidance, even though several 
workers have recognized the need for evaluating this type of 
guidance [Tuttle (7), Livingood (5)]- Others [Burke (3), 
Mallay (6), and Williamson and Darley (9) ] have attempted 
to distinguish the socially “well” adjusted from the socially 
“poorly” adjusted. But in the psychological literature of the 
last five years there is no report of a study of the effect of any 
particular controlled factor on social adjustment 

Several investigations at the University of Minnesota have 
demonstrated the need for further concern with the social and 
extra-curricular program [Chapin (4), Brown (2)]. The 
more extensive of these was that by Brown in 1934. She found 
'that one third of the students spend no time or money in 
activity participation and concluded that “students most need¬ 
ing social contacts were those who profited least from the 
opportunities offered.” (2:263) 

These conclusions might be interpreted to mean that there 
is little hope of better adjusting asocial students. Another pos¬ 
sibility is that if these students who did not participate were 
given an intensive well-directed program of participation, they 
might become better adjusted socially. Is it possible to lead 
the asocial to activities and find any changes in their social 
interests and attitudes? 

The essential plan of this research was to expose students 
to certain social influences and measure any changes resulting 
from the contacts formed. To be of value, it was necessary 
that these influences be normal extra-curricular and counseling 
activities available to all college students. It was also neces¬ 
sary that the control group technique be used to determine 
what would occur without these special influences. This need 
led first to a consideration of possible methods for measuring 
changes that occur in the social adjustment of college students. 
To make the group as homogenous as possible, it seemed 
advisable to limit the study to freshman girls. Since all the 

210 



exploratory study of social guidance 


girls were to be treated as a part of a normal counseling pro¬ 
gram, it was further necessary to study only girls who had 
gone through the University Testing Bureau. This Bureau is 
a counseling agency set up by the University as a personnel 
service open to all students. Testing Bureau cases are given a 
rather extensive testing program including a series of per¬ 
sonality scales. During the summer of 1939, 198 freshman 
girls came to the Bureau for guidance prior to registration in 
the University. From this group the experimental group was 
further selected by the requirement that the research be done 
on asocial girls as indicated by personality test scores and 
activity records. 

These conditions help to explain why the general problem 
of the effect of social guidance becomes quite specific, i e., what 
is the extent of change, if any, in the measured social adjust¬ 
ments and activity records of “under-socialized” University 
Testing Bureau freshman girls following counseling on social 
problems and directed participation in extra-curricular and 
social activities? 

The first step in the attack on this problem was the selec¬ 
tion of the sample group. The case records of the 198 Testing 
Bureau cases were read and a record kept of high school 
scholarship percentile rank, raw score and percentile rank on 
the American Council Psychological Examination, the Co¬ 
operative English Test, the Minnesota Inventory of Social 
Attitudes —Forms P and B, the Bell Adjustment Inventory —■ 
Social, and the Rundquist-Sletto Inferiority Scale. In addition, 
each girl's group and individual activities listed on the Indi¬ 
vidual Record Form of the Testing Bureau were recorded. 
These two sections are given in the form of a check list on 
which the subject is asked to indicate those activities “in which 
you engage frequently,” The group activities include team 
sports, clubs, church organizations, and group parties, while 
the individual activities are things done alone or with a single 
other individual, such as sewing, reading, and tennis. 

Those girls who had scores in the lower one half of two 
of the three distributions of Social Preferences, Social Be- 


211 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


havior {Minnesota Invenloiy of Social Attitudes —Forms P 
and B), and group activities were selected as the sample 
group. These three measures were assumed to give an indi¬ 
cation of the preferences, behavior, and previous interest in 
social activities. It should be noted that in order to get a 
sample of any size the levels had to be fairly high, running up 
to the median or even higher. This selection from the 198 
Testing Bureau cases yielded a sample of 79 freshman girls. 
This group was then divided into two random samples of 40 
and 39, which became the experimental and control groups. 
The remaining 119 Testing Bureau cases were used for pur¬ 
poses of comparison. 

The treatment of these two groups must be emphasized 
here since this is the crux of the method. The control group 
of 39 cases was in no way influenced by this study. These girls 
were handled in the customary manner by the Testing Bureau. 
Following a preliminary interview and testing, each girl was 
assigned to one of the five counselors in the Bureau. The usual 
counseling interview is chiefly concerned with educational and 
vocational problems. If the social aspects are of importance, 
however, they may also be considered. No generalizations can 
be made concerning the social counseling of the control group 
except that they were exposed to the “normal" counseling pro¬ 
gram which might include some social guidance. 

The treatment of the experimental group, however, went 
further in giving all of the members of this group an oppor¬ 
tunity to participate in social activities. Nine girls in this 
group as well as 11 in the control group failed to complete 
the counseling and retesting program Seven girls in this group 
were already participating, and the counselors merely dis¬ 
cussed their social interests with them. Four were untreated 
because the counselors felt that academic activities should take 
all of their time if they were to continue in school. The le- 
maining 20 were Interviewed by the counselors with a special 
emphasis on social adjustment. Each interview took place at 
the end of the first quarter of the school year. The investi¬ 
gator consulted with the counselors concerning the activities 

212 



EXPLORATORY STUDY OF SOCIAL GUIDANCE 


which might appeal to each girl, but the interview was very 
much an individual affair. Following this contact, the inves¬ 
tigator attempted to carry out the suggestions of the coun¬ 
selors by personally introducing the girl to those activities in 
which she expressed an interest. Active participation was 
facilitated in every way possible. At the same time every 
precaution was taken to make the social program normal. In¬ 
troductions were made to various campus organizations which 
had been Informed that the Testing Bureau had appointed a 
special counselor to act as liaison officer between the Bureau 
and the organization. The extra-curricular organization heads 
did not know that this was in any way a research project, and 
there is every reason to believe that they gave these girls 
attention similar to that given any girls recommended by a 
campus agency. The experimental and control groups both 
had some counseling, but the experimental group had more 
than the control group. 

At the end of the school year both the experimental and 
the control groups were retested. After three notices all but 
20 girls responded; of the 79 originally selected for study, 31 
experimental and 28 control subjects were retested. The tests 
which were given again were the Bell, the Rundquist-Sletto, 
the Social Preferences, and the Social Behavior. Also included 
was an Activity Record covering the freshman year. The tests 
were given under conditions identical with those of the 
oiiginal testing. 

It is important that some consideration be given to the 
original nature of the control and experimental groups. Using 
the common t test for the significance of the difference be¬ 
tween two means,^ the sample group (the 59 cases who were 
retested) does not differ from the rest of the Testing Bureau 
cases (the Freshman women Bureau cases which were read 
but not selected for this study) in mean American Council on 
Education test score or the Cooperative English Test score. 
It does have a significantly higher mean score than frequently 

2(=the diffeience in means divided by the standard erroi of that difference. 

213 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


used norm groups. As would be expected, the experimental 
and control groups are significantly lower in mean score on the 
Social Behavior and Preference scales when compared with 
the rest of the Bureau group, but they do not differ in mean 
Social Behavior score from college Freshman women norm 
groups as determined from the norms given by the authors of 
the test. The difference between the mean Social Preference 
scores for the sample group and the same norm group is just 
significant (with the sample group lower). The experimental 
and control groups differ on the Bell and Rundquist-Sletto 
from the rest of the Bureau group but not from comparable 
norm groups. These facts lead to the conclusion that although 
the experimental and control groups are socially “poorly” 
adjusted when compared with the rest of the Testing Bureau 
group, they are not clearly different on personality measures 
from the compaiable norm groups. Thus this study was not 
confined to a group of extreme deviates in personality scores. 

Although the control group has a slightly higher median 
high school percentile rank than the experimental group, the 
two groups are remarkably similar in original testing on the 
six objective measures, and all indicated group and individual 
activities. It is, therefore, safe to assume that any significant 
differences on retesting may be attributed to differential treat¬ 
ment. It was also found that the 20 cases who did not appear 
for retesting did not differ significantly on original testing 
from the rest of the sample group. 

The differences on retesting between the experimental and 
control groups provide the basis for an estimate of the suc¬ 
cess of social guidance. It would be desirable to obtain some 
estimate of the amount of guidance and relate this to the 
amount of participation, although this was not done here. A 
simple comparison was made of mean gains. These results 
and those from the Activity Record, using t tests where pos¬ 
sible, indicate that: 

1. There was a significant mean gain made by both the 
experimental and control groups on retesting on the Rundquist- 
Sletto Inferiority, the Social Preferences, and the Social Be- 

214 



exploratory study of social guidance 


havior scales. The control group did not gain on the average 
on retaking the Bell Social scale, while the experimental group 
did gain. 

2. A comparison of the mean gains made on retesting 
after 9 to 11 months on these measures by the experimental 
and control groups shows that on all except the Social Prefer¬ 
ence scale the experimental group gained significantly more 
(see Table 1). 

TABLE 1 

mean CAINS MADE ON THE FOUR PERSONALITY MEASURES DY THE EXPERIMENTAL AND 
CONTROL GROUPS ON RETESTING 


Measure 


Experimental 


Control 

t 

Pi 

N 

Mean 

Gam 

SD. 

N 

Mean 

Gam 

SD 

Social Beh .. 

30 

+.+0 

1012 

28 

2 21 

12.33 

3.65 

<.01* 

Social Pref, 

. 30 

6 20 

13 00 

28 

6 18 

15 47 

.03 

>,05 

Rund-Sletto 

28 

S 17 

7.5S 

26 

1.85 

6.81 

7.64 

<01* 

Bell-Social 

, 29 

3,28 

S.9S 

26 

0.00 

7.43 

8.77 

<,01* 


’‘‘Significaut 

There is also some indication that the gain is greater for the 
members of the experimental group who were given the most 
guidance. The fact that the Social Preference scale shows an 
insignificant difference in mean gain for the two groups sug¬ 
gests that social guidance has an effect on the actual social 
behavior or amount of social activity but does not affect social 
preferences. 

3. At the beginning and at the end of the experimental 
period the counselors rated the members of the experimental 
group on a rough scale of social adjustment. Only seven per 
cent of the group was rated lower on second rating and 38 
per cent rated higher. There is no comparable measure for 
the control group, so the significance of this gain is difficult to 
interpret. 

4. The experimental and control groups encircle about the 
same number of Individual activities on retesting, but the 
experimental group encircles more group activities than the 
control group on retesting. 

5. The experimental group reports more hours per week 
spent in extra-curricular activities than the control group and 
more offices and committees in these activities. 

6. The experimental group indicates on a rating scale that 
they want to participate In fewer additional activities, think 
that they have made more friends, feel that they have par¬ 
ticipated in more activities compared with high school, and 
have a better opinion of the extra-curricular and social pro¬ 
gram on the campus than the control group. 

215 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

7. Both the experimental and control groups feel that 
they are in fewer activities but have made more friends in 
college than in high school. 

All of these findings combine to indicate that, from this 
small sample, social guidance and directed participation in 
extra-curricular activities Improve the “social adjustment” of 
freshman girls as measured by personality scales and a ques¬ 
tionnaire. Not only do the girls in the experimental group 
make greater mean gains, but they feel that they have more 
friends, participate in more activities, and are less critical of 
the social program than the control group. A treatment that 
makes people feel better satisfied with their social life is cei- 
tainly worthy of further consideration The problem was, 
however, essentially an investigation of a method and as such 
the results should be emphasized only as a justification for the 
further use of the method. 

REFERENCES 

1. Beaumont, H. "The Evaluation of Academic Counseling", Journal 
of Highei Education, X, (1939), 79-82, 116. 

2 Brown, Clara. “A Social Activities Suivcy”, Jouinal of Hiffher 
Education, Ylll, (1937), 257-265. 

3. Burks, F. W. “Some Factors Related to Social Success in Col¬ 
lege", Journal of Social Psychology, IX, (1938), 125-140 

4. Chaf)in, F, Stuait. Extra-cwi icular Activities at the University of 
Minnesota. Minneapolis: University of Minnesota Press, (1929), 

5 Livingood, F. G. “Directed Extra-curricular Activities and Ad¬ 
justments", Mental Hygiene, XX, (1936), 614-623. 

6. Mallay, H. “A Study of Some of the Factors Underlying the 
Establishment of Successful Social Contacts at the College Stu¬ 
dent Level”, Journal of Social Psychology, VII, (1936), 205-228 

7. Tuttle, H. S. “The Campus and Social Ideals”, Journal of Educa¬ 
tional Research, XXX, (1936), 177-182. 

8. Williamson, E. G. and Bordin, E. S. “Evaluating Counseling by 
Means of a Control-group Experiment”, School and Society, LII, 
(1940), 434-440. 

9. Williamson, E. G. and Darley, J. G. “The Measurement of Social 
Attitudes of College Students. II. Validation of Two Attitude 
Tests", Journal of Social Psychology, VIII, (1937), 231-242. 

10. Wrenn, C, G. The Evaluation of Guidance, Purdue University; 
Studies in Higher Education, No. 37, (1940), 51-61. 



NEW TESTS* 


CoQperaitve Chemistry Test for College Students, by B. Clifford Hen¬ 
dricks, B, H. Handorf, 0. M. Smith, Cluis F. Kcim, Rufus D. 
Reed, Alexander CaUndra, Ralph W, Tyler, and Fred P. Frutchey. 
Form 19'1'2, Part I, Information and Vocabulary,• Part II, Prob¬ 
lems and Equations; and Part III, Scientific Method. Time, 90 
minutes. 10 to 99 copies 6j^c; 100 or more copies 6c; specimen 
set 25c. Published by the Cooperative Test Service, 15 Amsterdam 
Avenue, New York City. 


Cooperative English Test, by Geraldine Spaulding and Frederick B. 
Davis. 1942 Form S. Test A, Mechanics of Expression; Te.st 
Bl, Effectiveness of Expression (Lower Level) ; Test B2, Effective¬ 
ness of Expression (Higher Level), Test Cl, Reading Comprehen¬ 
sion (Lower Level); Test C2, Reading Comprehension (Highei 
Level). Time, 40 minutes for each test. 10 to 99 copies 5j^c; 100 
or more copies 5c; specimen set 25c. Published by the Cooperative 
Test Service, 15 Amsterdam Avenue, New York City. 


CoQpciative French Test, by Geraldine Spaulding, Laura Towne, and 
Sarah Wolfson Lorge. 1942. Form S, Lower Level for use in the 
first two years of high school or the fii'st year of college, Highei 
Level for use with students who have had more than two years study 
of French in high school or more than one year in college, Part 
I, Comprehension; Part II, Grammar; and Part HI, Civilization. 
Time, 80 minutes, 10 to 99 copies 6)4c; 100 or more copies 6c; 
specimen set 25c, Part I available as separate booklet, 10 to 99 
copies Sj4c; 100 or more copies 5c, specimen set 25c. Published by 
the Cooperative Test Service, 15 Amsteidam Avenue, New York 
City. 


Cooperative Italian Test, by Peter Riccio and Anthony Cuffaii. 1942. 
For students who have had two semesters or more of study of Italian. 
Experimental Form S. Time, 70 minutes. Part I, Reading; Part 
II, Vocabulary; and Part III, Grammar. 10 to 99 copies 6J^c; 100 
or more copies 6c; specimen .set 25c. Published by the Cooperative 
Test Service, 15 Amsteidam Avenue, New York City, 


Cooperative Latin Test, by Harold V King and Geraldine Spaulding. 
1942. Form S, Lower Level to cover beginning Latin and Caesar; 


*Prepared by Jane Gilbert. 


217 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Higlier Level foi use with students who have completed one semester 
or itioie of studjf beyond Caesar. Part I, Comprehension; Part II 
Grammar; and Part III, Civilization. Time, 80 minutes. 10 to 99 
copies 6j4c; 100 oi more copies 6c; specimen set 25c. Part I avail¬ 
able as separate booklet, 10 to 99 copies Sj4c; 100 or more copies 
5c: specimen set 25c. Published by the Cooperative Test Service 
15 Amsterdam Avenue, New York City. 


Cooperative Test in Secotidmy School Mathematics (Higher Level), 
by Margaret Martin, William Mollenkopf, RadcHffe W. Bristol', 
William S. Litterick, and Carroll G. Ross. 1942. Form S. For 
grades 10 to 12. Time, 80 minutes. 10 to 99 copies 6j4c; 100 
or moie copies 6c; specimen set 25c. Published by the Cooperative 
Test Service, 15 Amsterdam Avenue, New York City. 


Interest Inventory for Elementary Grades, by Mitchell Dieese and Eliz¬ 
abeth Mooney. 1941. Time, about 30 minutes. 5c each; manual 
15c; specimen set 25c. Published by the Center for Psychological 
Service, George Washington University, Washington, D. C. 


Meier Art Judynieni Test, by Norman Charles Meier. Revised 1943. 
Grades 7 through adult. Time, about 45 minutes. Test books 7Sc; 
$3.50 for 5; $6,25 for 10c; 55c in lots of 25; record sheets 2j4c; 2c 
per 100; manual 10c; sample set 90c. Published by the Bureau of 
Educational Research, State University of Iowa, Iowa City, Iowa. 


Otis Classification Test, by Arthur S. Otis. Revised 1941. Forms R, 
S. and T. For grades 4 to 8. Time, 30 minutes for each part. 
Hand- and machine-scored. $1.25 per 25; specimen set 30c. Pub¬ 
lished by the World Book Company, Yonkers-on-Hudson, New 
York 


Pintner-Durosi Elementary Test, by Rudolf Pintner and Walter N. 
Durost. For grades 2, 3, and 4. Form A, Scale 1 (Picture Con¬ 
tent) and Scale 2 (Reading Content). $1.35 per 25 for Scale 1; 
$1.20 per 25 for Scale 2; specimen set (Scale 1 and Scale 2) 30c. 
Published by the World Book Company, Yonkers-on-Hudson, New 
York. 


Pieference Record, by G. Frederic Kudei. 1942. Form BB for self¬ 
scoring; Form BM for machine-scoring. For high-school and col¬ 
lege students and adults. Time, about forty minutes. Test booklets 
25c; answer pads 5c; profile sheets $1.25 per 100; specimen set 25c. 

218 



NEW TESTS 


Published by Science Reseaich Associates, 1700 Prairie Avenue, Chi¬ 
cago, Illinois. 


Purdue Placement Test in Englishj by J. H. McKee, G, S. Wykoff, 
and H. H. Remmers. 1941. For high school seniors and college 
freshmen. Time, about 35 minutes. Form C, $1.65 pei 25; separate 
answer sheets 75c per 25. Published by Houghton Mifflin Company, 
2 Park Street, Boston, Massachusetts. 


Terman-McNemar Test of Mental Ability, by Lewis M. Teiman and 
Quinn McNemai. 1942. For grades 7 to 12. Time, 40 minutes. 
Forms C and D. $1,25 per 25; specimen set 20c, Published by 
the World Book Company, Yonkers-on-Hudson, New York. 


Study-Habits Inventory, by C. Gilbert Wrenn. Revised 1941. For 
grade 12 and college. $1.25 per 25, $3.50 per 100; $2.50 per 100 
for 1000 or more. Published by Stanford University Press, Stanford 
University, California, 


Test of Fiacttcal Judgment, by Alfred J. Cardall. 1942. For 12th 
grade level and above. Time, about 45 minutes, Hand- or machine- 
scored. 10c each; specimen set 25c. Published by Science Research 
Associates, 1700 Prairie Avenue, Chicago, Illinois. 

1 


The World Test, by Charlotte Buehler and Gayle Kelley. 1941. To 
measure emotional problems. For clinical use with children 5 to 11. 
Time, about 20 minutes. Complete test materials, manual, and 25 
record forms, $60 00. Published by the Psychological Corporation, 
522 Fifth Avenue, New York City. 


219 



MEASUREMENT ABSTRACTS^ 

Bellows, R. M. "Procedures for Evaluating Vocational Criteria," 

Journal of Applied Psychology XXV (1941), 499-513. 

The fact that the basic vocational criteiia used in the evaluation of 
predictive instruments are fallible is geneially neglected. The source 
of fallibility may lie in such factors as (1) illicit use of predictive in¬ 
formation giving previous knowledge of psychological test scores or 
other peiformance ratings; (2) artificial limitations of production 
brought about by physical conditions influencing output of work; (3) 
differential experience or training. To overcome the influence of cii- 
terion contamination seveial checks aie recommended and evaluated. 
Knowledge of the future validity of a predictor is impossible because 
of vaiious changes in the situation. No single procedure for criterion 
evaluation is adequate, which suggests that indices of validity are largely 
determined by the degree of fallibility of the criterion, and that the 
interpretation of such indices is dependent upon knowledge of the cil- 
terion used in validation. L. Bouthilet, 


Buros, Oscar K. (editoi) The Second Yeaibook of Research and Sta¬ 
tistical Methodology. Highland Park, New Jersey, Gryphon 
Press. 1941. 

This yearbook has been compiled in an effort (a) to make students 
and teachers of statistics aware of inaccuincies and the inadequacy of 
'much cuirent statistical literature and information, (b) to serve as a 
source for selection of textbooks with discrimination, (c) to evaluate 
weak and strong points of statistical books, (d) to point out current 
developments in monograph and textbook writing and ciiticism, (e) 
to acquaint statistical workers with the broad applications of statistical 
work in many fields, (f) to present different points of view among 
students of statistical theory, (g) to improve the quality of such book 
reviews by more careful choice of reviewers and by stimulating reviewers 
not to review books which they cannot appraise adequately. 

The editor has greatly increased the scope of this volume over an 
earlier one, including 1,652 review excerpts from 283 journals. An 
attempt has also been made to list books on research methodology in 
specific fields, although this list is by no means inclusive. However, 
this yearbook represents a significant contribution to the field of method¬ 
ology and should make workers in this field more acutely aware of cur¬ 
rent developments, Jane Gilbert. 

Cronbach, L. J. "An Experimental Comparison of the Multiple True- 
False and Multiple Multiple-Choice Tests." Journal of Educational 
Psychology, XXXll (1941), 533-543. 


220 


♦Edited by Forrest A, Kingsbury. 



MEASUREMENT ABSTRACTS 


Two subject-matter tests, one in multiple true-false form, and the 
other in multiple multiple-choice foim were administered to 57 and 60 
students, respectively. The former consists of multiple-choice items in 
which each alteinative is marked tiue or false by the student; the latter, 
of similar items in which only coirect alternatives are marked. Results 
showed the two forms were essentially equivalent. The hypothesis is 
advanced that the tendency to mark uncertain items “true” may be a 
peisonality trait which may influence the validity of tiue-false test scores. 
L, Birdsall. 


Ewart, E., Seashore, S. E., and Tiffin, J. “A Factor Analysis of an 
Industrial Meiit Rating Scale.” Journal of Applied Psychology, 
XXV (1941), 481-486. 

In order to deteimine how many traits actually influence the ratings, 
tetiachoric intercorielations were computed for ratings on a twelve- 
trait scale constructed for use in a large industrial plant. This correla¬ 
tion matrix was factored by Thurstone’s centroid method, and the 
factors rotated for simple stiucture. Three factois were obtained: I, 
a general factoi, teimed “ability to do the present job,” accounts for 
most of the total variance of the scale. Factor 11 represents knowledge 
or skill ovei and above the requirements for the specific job. Factor 
III IS on the variable “health.” Factors I and III are orthogonal while 
Factors I and II aie oblique. K, S. Yum. 


Foilano, G. and Pintner, R. “Selection of Upper and Lower Groups 
for Item Validation.” Journal of Educational Psychology, XXXII 
(1941), 544-549 

Two sets of data from the Study Habits Inventory and the Home- 
Background Survey Test have been subjected to item validation, using 
five different methods of selecting upper and lower groups. The authors 
conclude that for a simple and rapid, rough-and-ready method of valida¬ 
tion of test Items of the inventory type, the upper versus lower 27 per 
cent method is preferable, even though distributions are more or less 
non-normal. The other upper versus lower methods studied were 50 
per cent, 33^ per cent, 16 per cent, and 7 per cent. E. S. Yum, 


Gilman, W. A. and Gray, D. E. "Guessing on True-False Tests.” 
Educational Reseatch Bulletin, XXI (1942), 9-12. 

The attempt to penalize guessing by subtracting the number of 
wrong answeis from the number of correct answers is ineffective. This 
is clear from the study of a case in which there are n pure guesses. 
Theoretically the student would have-;^ correct answers and-^incorrect 

answers; hence the increment of ^ would leave his grade on other 


221 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


items unaltered. In practice, pure guessing rarely exists. The posses¬ 
sion of partial knowledge gives the student better than a fifty per cent 
chance; hence it is to his advantage to guess on tests thus scored. George 
W. Boguslavsky. 


Giowdon, C. H. "The Revised Stanfoid-Binet Scale Applied as a 
Point-Scale." Jotviial of Applied Psychology, XXV (1941), 660- 
671. 

Form L of the Revised Stanford (from Year VI up) has been ar¬ 
ranged as a point-scale. By testing each subject only with that range 
of tests limited by 5 consecutive successes below the first failure and 5 
consecutive failures above the last success, a very reliable mental age 
Is obtained. Rescoiing the records of 440 children of the usual clinical 
types yields I. Q.’s which correlate .976 with regular Stanford-Binet 
I. Q.’s. For eleven rncntal age levels the average saving in number of 
tests given was about 35 per cent, with I Q. variations not to exceed 
5 points in 9 of every 10 cases. F, A. Kingsbury. 


Harding, J. "A Scale for Measuring Civilian Morale” Journal of 

PrycWoiiy, XII (1941), 101-110. 

Out of a list of 59 items given to two criterion groups, high morale 
and low morale, 20 items were chosen to form the present morale scale. 
Each item is included in one of the four clusters: a. an attitude of con¬ 
fidence in the broad framework of capitalist democracy; its opposite, 
cynicism; b. an attitude of tolerance for various groups, c. an attitude 
of realism as opposed to wishful thinking; d. an attitude of assertive 
idealism in international affairs. Scoring for each item is on a five-point 
scale. Thus “total morale scores” may be computed. Louise Giossmckle. 


Heston, J. C, and Canncll, C. F. “A Note on the Relation Between 
Age and Performance of Adult Subjects on Four Familiar Psycho¬ 
metric Tests.” Journal of Applied Psychology, XXV (1941), 
415-419. 

Vocabulary Tests from Form L of the Revised Stanford-Binet Scale, 
Knox Cubes, Porteus Mazes and Ferguson Form Boards Tests were 
given to members of borrower families of the F.S.A. in Ohio, Maine, 
and Missouri. The data include 643 cases, 375 men and 268 women, 
all white. The age range for men was 15 to 76, and for women, 15 
to 72, with medians at 37.5 and 35.0 years respectively. Two contrast¬ 
ing tendencies are noted on the age curves of scores of these tests On 
the vocabulary test there is a rapid increase from age 15 to 20, then a 
slight rise up to 55, where a small drop occurs; while on the performance 
tests a rapid decline seems to be a characteristic tendency. K. S. Yum. 


Jones, H. E. “Seasonal Variations in I. Q." Journal of Experimental 
Education, X (1941), 91-99. 


222 



MEASUREMENT ABSTRACTS 


A study of 19 compaiisons of fall-to-spring versus spring-to-fall I. Q. 
changes in children of preschool age levealed that 18 of the 19 compari¬ 
sons show a greater gain over the winter interval than over the summer 
Interval. Four alternative hypotheses were consideied: 

1. Seasonal variations in the testers. 

2. Seasonal vaiiations in test performances. 

3. The dependence of performance on seasonal variations in the 
child’s activity. 

4. The effect of seasonal vaiiations on mental and physical 
growth. 

George W. Boguslavsky, 


Katz, Evelyn. "The Constancy of the Stanfoid-Binet I. Q From 
Three to Five Years.” Journal of Psychology^ XII (1941), 
159-182. 

The Brush Foundation of Western Reserve has the records of 308 
children of high socio-economic level, tested at six-month intervals from 
three to five years of age. “Test-retest correlations range from .533 to 
.765, the size of the correlations being unrelated to age but inversely 
1 elated to the interval between tests.” "The group as a whole shows 
a small increase in I. Q. with age.” Large gains and losses of 20 or 
more points are more frequent over the longer intervals of time and 
for the younger ages. They are present in approximately 10 per cent 
of the test -1 etest comparisons, and occur for 40 per cent of the children. 
“These frequent fluctuations should probably be regarded as typical of 
children between three to five years who come from families of superior 
socio-economic status.” Helen M. Wolfe. 


Lindquist, E. F. A First Course in Statistics. Boston, Houghton 

Mifflin Company. 1941. 240pp. 

This elementary statistics textbook presents a well-organized ap¬ 
proach to the problem of measurement. An accompanying workbook 
has been designed to help the student integrate the theoretical approach 
with actual practice in applying these principles. The topics presented 
are as follows; frequency distribution, percentiles, graphical representa¬ 
tion of frequency distributions, measures of central tendency, measures of 
variability, the nature of the normal curve, sampling error theory, stand¬ 
ard measures and methods of combining test scores, correlation theory, 
and correlation techniques applied in the evaluation of test materials. 
Jane Gilbert. 


Morrow, Robert S. “An Experimental Analysis of the Theory of 
Independent Abilities.” Journal of Educational Psychology, XXXII 
(1941), 495-511. 

“Eighty relatively homogeneous male college students were given 

223 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


in a random manner" 23 sublests of standard tests of intelligence, artistic 
judgment, and clerical, mechanical, and manipulative ability The cor¬ 
relations were analyzed by the “centei of gravity" method into four 
factors. The factors were not lotated and weie difficult to interpret, 
"By viitue of these findings,” states the authoi, "it would appear that 
the Spearman and Thuistonc theories aie Inadequate for explaining 
the relationships expressed In this study. Rather, one must conclude 
with the hypothesis that the abilities heic tested are not disparate and 
static abilities, but that they are, instead, functional and dynamic rela¬ 
tionships within the total personality.” Helen M. Wolfle. 


Reed, H. B. "The Place of the Bernreuter Personality, Stenquist Me¬ 
chanical Aptitude, and Thuistone Vocational Interest Test in Col¬ 
lege Entrance Tests.” Joutnal of Applied Psychology^ XXV (1941), 
528-534. 

This investigation proposed to investigate the inter-relations between 
the Bernreuter, Stenquist, and Thuistonc tests and scholastic achieve¬ 
ment in ordei to find the place of such tests in a battery of college 
entiance examinations. The Bernreutci scores wcie also compared with 
teachers’ ratings of tiaits of the same name as those in the test Results 
show that there was little or no relationship between the three tests and 
scholastic achievement. It is concluded that the tests are of little value 
for guidance in choice of college couises, although the usefulness of the 
tests for other purposes was not investigated. L. Bouthilet. 


Robinson, Frances P Diagnostic and Remedial Techniques for Effective 

Study. New York, Plarpei and Brothers. 1941. 318pp. 

This handbook has been evolved as a result of the author’s experi¬ 
ence at the State University of low'a and an extensive how-to-study 
program at Ohio State University. The major emphasis in this book 
has been placed on diagnostic tests which are based on research analyses 
of college work and student eriors rather than standard academic or¬ 
ganization. The types of areas measured include study habits, reading 
skill, skill in use of academic resources, knowledge of fundamental proc¬ 
esses and background knowledge, health, vocational planning, social ad¬ 
justment, personal problems and motivation. All materials necessary for 
test administration and scoring are included in this book. Comparable 
retests are also available to help the student evaluate his improvement 
and to see the nature of his remaining problems. The book cannot be 
used independently by a student, but it should form a working basis 
for individual counseling and delineation of specific ai eas in which reme¬ 
dial treatment is indicated. Jane Gilbert. 


Shuttleworth, F. K. "Sampling Errors Involved in Incomplete Returns 
to Mail Questionnaires.” Journal of Applied Psychology, XXV 
(1941), 588-591. 


224 



MEASUREMENT ABSTRACTS 


There has been little attempt to determine the sampling errois due 
to incomplete returns of mail questionnaires. The only adequate check 
is to compare incomplete returns with complete letuins, In a study of 
the employment status of ceitain university alumni, it was found that 
serious sampling enors were involved, the eailiest returns coming fiom 
the more successful alumni The conclusion is diawn that each question¬ 
naire situation needs intensive study, which should include a complete 
return from at least a portion of the total population. L. Bouthilet. 


Sloan, W. and Shaip, A A. "A Note on Interpolation of Kent Oral 
Emergency Test Semes into Mental Age Yeats and Months." Jow- 
nal of Applied Psycholo^j), XXV (1941), 592-594. 

The method consists of dividing equally the 12 mental age months 
in each yeai so that the fiist point at each year level falls exactly on that 
yeai. A coiiesponding column of I. Q.’s foi adults is given with the 
chronological age of sixteen as a constant divisoi. K. S. Yum, 


Stalnaker, J. M. “A Note on the Computation of Y Values for Integial 
Values of X, when Y is a Lineai Function of X." Journal of Edu¬ 
cational Psychology, XXXII (1941), 559-560. 

The author reports a method foi rapid and accuiate determination 
of converted scores for a large number of law scoies, with the aid of 
accounting machines and punched-caid methods. The method demands 
a minimum of hand labor. The piocedure is applicable to any situa¬ 
tion where one set of scores is to be transmuted into any set of scoies, 
providing the two sets are in a linear relationship, and the one variable 
changes in unit steps. K. S. Yum. 


Super, D. E., and Roper, S, A. "An Objective Technique for Testing 
Vocational Interests” Journal of Applied Psychology, XXV 
(1941), 487-498. 

A technique developed for testing vocational interests objectively 
is described. Pictures and films depicting different phases of various 
occupations are used. The assumption is made that memory for what is 
seen will be greatest in the field of greatest interest. Methods of valida¬ 
tion are described The test of interest in nursing was administered to 
nmses and 111 high-school students, 36 of whom planned to enter 
nursing. Intelligence,_ previous knowledge, and success in nursing school 
in uence the scores slightly or not at all. The present test showed no 
correlation with the Strong Vocational Interest Blank. The authors 
conclude the two are equally valid, but that the former measures degree 
ot interest, whereas the latter compares the interests of subjects and 
those in the field. L. BirdsaU. 


22S 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Tiaxler, A. E. “The Reliability of the Bell Inventories and Their 
Correlation with Teacher Judgment” Journal of Applied Psvchol 
ogp. XXV (1941), 672--678. 

Scores of 43 high-school pupils on the Bell Adjustment Inventory 
and the Bell School Inventory have been correlated by the split-half 
method. All the reliability coefficients are above .80, and some of them 
aie close to or above .90. Coiielations between the scores and the 
ratings by teachers and counselors on 33 pupils have been obtained. Four 
of the six correlations are statistically significant. However, the corre¬ 
lations, being low, fail to substantiate the validity of the inventories. 
The author suggests that we should have a criterion that will be much 
more defensible than a rating scale. Louise Grossnickle. 


Yum, K. S. “Primary Mental Abilities and Scholastic Achievements 
in the Divisional Studies at the University of Chicago.’’ Journal of 
Applied Psychology, XXV (1941), 712-720. 

What particular combination of primary mental abilities is required 
for success in the divisional studies of the physical, biological and social 
sciences? The scores of 110 University of Chicago juniors were exam¬ 
ined. “According to the critical ratios, there apparently exists no sig¬ 
nificant difference between the biological and social science groups.” The 
mean profile of the physical science group (but not the total score) is 
significantly different from the other two. Induction distinguishes physi¬ 
cal science men from biological science men, and deduction, space, and 
induction distinguish physical science men from social science men. The 
correlations of the factors with giades lange from —.17 to -f.52. "In 
general, the verbal, inductive reasoning, and deductive reasoning factors 
seem to correlate better with scholarship.” Helen M. Wolfle. 


226 



MEASUREMENT NEWS 

The Peisonnel Fiocedures Section, foimeily the Peisonnel Reseaich 
Section, of the War Department has developed in recent months a vari¬ 
ety of classification, special aptitude, and achievement tests for the use 
of the Army. The section is currently interested in the selection of 
olficer candidates and militaiy specialists and in the training of physically 
and mentally limited men. 

Among the officers on active duty with the section aie Major 
Morton A. Seidenfeld, formeily of the National Tubeiculosis Associa¬ 
tion; Lt. Donald E. Baiei, on leave from the Mental Hvgiene Bureau 
of the New Jersey State Hospital; and Lt. T. W. Hancll, who was in 
chaige of leseaich foi the section in a civilian status. Captain Sidney 
Adams, formerly of the Employment Section of the Tennessee Valley 
Authority, has left the section foi duty in the field, Among the civilian 
personnel of the section aie: Dr. Clyde H Coombs, formeily of the 
University of Chicago; Dr. Louise R, Witmer, on leave from Floiida 
State College foi Women; Di. Bronson Piice, formerly of Ohio State 
University; Dr. Reign H. Bittner, also formerly of Ohio State Univer¬ 
sity; Mrs. Ruth D Chuichilt, foimerly of the University of Minne¬ 
sota; Dr. Alvin C. Euiich, formerly of Stanford University; and Mr. 
Howard UphofE, formerly of the U. S. Civil Service Commission. 


Schools interested in building pupil morale for meeting war haid- 
ships will be interested in a “Test on the Effects of War” designed for 
the study of pupil moiale and to identify war pioblems about which 
further instruction is needed. The test, prepared by Dr. Lee J. Cron- 
bach, has been released by the School of Education of the State College 
of Washington, Pullman, Washington. The test is planned for grades 
10, 11, and 12, but may be used at higher levels. Seventy statements 
about conceivable futuie developments are presented, and the pupil is 
required to respond by indicating how likely he thinks each effect is. 
Responses are analyzed to determine how optimistic or pessimistic each 
pupil is. Since good morale depends on a realistic outlook and planning 
for future developments, both the highly optimistic or complacent pupil, 
and the highly pessimistic, panicky pupil, are pointed out as cases for 
individual guidance. An item analysis of the responses of the group 
indicates those particular war problems about which pupils appear poorly 
informed. 

The test is being made available as a professional seivlce on a non¬ 
commercial basis to interested schools. For greatest value in planning 
the school program during wartime, the test should be given as early as 
possible. Question sheets, which may be used any number of times, 
sell for one cent apiece. Answer sheets, one of which is needed for 

227 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


every pupil tested, sell for five cents each. This charge covers the cost of 
producing the test and of a complete scoring service. All papers are 
scored, analyzed, and inteipieted by the State College without additional 
charge, 

The teat has been standaidizeJ on nearly two thousand pupils m the 
State of Washington, tested duiing January and February, 1942. The 
reliability of the Optimism score, on which the principal interpretations 
are based, is .77. 


The Committee of Examinations and Tests, Division of Chemical 
Education, of the Ameiican Chemical Society, has announced that the 
1942 Cooperative Cheniistry Test will be available by April first. In¬ 
quiries should be addiessed to the Cooperative Test Service, 15 Amster¬ 
dam Avenue, New York City. 

The accumulation of data and experience in recent years has had the 
effect of modifying the concept of what the test should measure. As a 
result of extensive discussion at a conference held at the University 
of Chicago last June, the 1942 Form of the test is consideiably different 
fiom the tests of the past four years. The test has been administered in 
a preliminary form to determine tire difficulty and validity of each item. 
A brief description of the test follows: 

Part 1. General Knowledge and Information. 

This section is based on knowledge of or acquaintance with impor¬ 
tant facts, definitions, laws and theories of chemistry. Historical events 
and application of chemistry to the social and economic world are 
lepresented. 

Part II. Application of principles. 

This part attempts to measure the ability to solve numerical prob¬ 
lems, to balance equations, and to make quantitative predictions by the 
application of chemical principles. 

Part III. Scientific Method. 

This section is concerned with the understanding of the relation of 
obseivation, definitions, laws, theories in the scientific procedure. The 
relation of theory to expeiiment is represented, as well as the ability to 
interpret chemical data. 

Part IV. Knowledge of Laboratory Technique and Procedure. 

This new section is included in the effort to measure acquaintance 
with the laboratory and knowledge of “correct" procedures. It does 
not attempt to measure skill or technique per se. 

228 



MEASUREMENT NEWS 


The comniittee which is sponsoring this test is comprised of the fol¬ 
lowing members of *-he Division of Chemical Education: 

B. Clifford Hendricks, University of Nebraska. 

Rufus D. Read, New Jersey State Teachers College. 

Ed. F. Degering, Purdue Univeisity. 

Lauience S. Foster, Brown University. 

Earl W. Phelan, Georgia State Womans College. 

Theodore A. Ashfoid, University of Chicago. 

Otto M. Smith, Oklahoma A and M College, Chairman. 


The Annual Report of the Scottish Council for Research in Educa¬ 
tion states that a mass of recoids — 2,500 of scale L and 350 of scale 
]Vd—have been collected with a view to standardizing the Terman- 
Meirill Revision of the Stanford Binet Scale for use in Scotland, It 
is hoped shortly to produce some evidence as to the suitability of this 
Revision for Scottish children. 

A lepoit on the follow-up of the random sample of 1,000 children 
and of the high scoieis in certain counties who were given the Binet 
test in the 1932 Mental Suivey is awaiting publication. It is interest¬ 
ing to note that an independent analysis of occupations has been made 
and con elated with each I. Q. gioup The relation of occupation to 
age and to class on leaving school has been worked out with respect to 
both initial and final occupations, that is, to those occupations entered 
upon leaving school and to those held for not less than one year imme¬ 
diately before the close of the survey. A geographical analysis, based 
on the Four Cities, uiban areas excluding the Four Cities, and rural 
areas, has also been made. The Annual Report states that so fai as can at 
present be ascertained the correlation between intelligence and initial 
occupation does not appear to be very high, but this relation is closer by 
the time the occupation held at the close of the follow-up is entered. 

Another report awaiting publication covers the results of an inquiry 
into methods of forecasting, at the qualifying stage, the pupil’s later 
success. The methods considered are the traditional examination, scho¬ 
lastic tests, an intelligence test and teacher’s estimate. It appears that 
the best combination, productive of the least number of misfits, is an 
intelligence test, an examination, and teacher’s estimate "scaled." 


The Psychological Classification and Research Sections of the Army 
Air Forces have established three Psychological Research Units These 
units are located at Maxwell Field, Alabama, Kelly Field, Texas, and 
Santa Ana, California, and are headed respectively by Laurence F. 
Sbaffer, Robert T. Rock, Jr., and J. P. Guilford. 

All aviation cadets are given psycho-motor and group tests for the 
purpose of classifying them for various duties in the aircrew. In addi- 

229 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


tioii to administering these tests, the units do some research on the gea- 
eial problem of dcteimining the aptitudes needed for different aircrew 
duties, as well as develop methods for the prediction of success in such 
duties. 

These units arc staffed by a group of officers and enlisted men All 
the officers aie well-qualified psychologists. Most of the enlisted men 
have done some graduate work in psychology and in addition have had 
some expeiience cither in using psychological laboiatory equipment or in 
the development, use, and validation of psychological tests. Periodically 
some qualified enlisted men aie recommended for officer candidate 
schools. Successful completion of such schools leads to a commission. 

Men interested in enlisting for such positions should send the follow¬ 
ing infoimation to the Army Air Forces, Office of the Aii Surgeon, War 
Department, Washington, D. C.; (1) full name, (2) date and place 
of birth, (3) local board numbei and order number, (4) four per¬ 
sonal leferences, and (5) complete work and educational histories, in¬ 
cluding a detailed desciiption of specialized training in psychology. Indi¬ 
viduals who expect to be inducted into the service soon and who desire 
to be considered for assignment to woik in psychology, should, in addi¬ 
tion to the previous information, include (6) piobable date of induction, 
stating whether notification of date of induction has been received and 
(7) probable place of induction. 


230 



EDUCATIONAL AND PSYCHOLOGICAL 

MEASUREMENT 


Volume II JULY, 1942 Number 3 


The Examiners Office of the University System of 

Georgia.233 

F, S. Beers 

Levels of Competence in Counseling— A Post-War Prob¬ 
lem FOR Student Personnel Work in Secondary Schools. .243 
Milton E, Hahn 

A Study of Some Local Factors Affecting Students' Scores 

ON the Minnesota Personality Scale.257 

Betty M. Hot ne and W G- McCall 

The Place of Aptitude Testing in the Public Schools ... 267 

Donald E Super 

Effect of Engineer School Training on the Surface De¬ 
velopment Test .279 

Ruth D. Churchill, Jeanne M. Cut tis, Clyde H. Coombs, 
and Thomas W. Harrell 

An Aid to Student Counselors.281 

Ralph F, Berdte 

A Comparison of the Human Behavior Inventory with Two 

Other Personality Inventories.291 

Abraham Sperling 

Intra-Individual Differences Versus Inter-Individual Dif¬ 


ferences IN Motor Skills.299 

William A. Owens, Jr 

New Tests 315 

Measurement Abstracts .317 










Copyright, 1943, by 

SCIIiNCK KESEARCII ASSOCIATES 


PRINTEV IN THE VNITEB STATES OP AMERICA 



THE EXAMINERS OFFICE OF 
THE UNIVERSITY SYSTEM OF GEORGIA 


F. S. BEI'-RS 
Social Seciuily Boaid 


T he university system of Georgia is unique 

among the states, It is a centrally administered, govern- 
mentally supported organization of 15 colleges now complet¬ 
ing Its first decade. Whether state-supported higher education 
so conceived and so administered can and should endure is a 
question which is fittingly being tried out, as it were, in “the 
oldest chartered state university” and its branches. 

Before 1931 there were 25 state-supported colleges in 
Georgia, with a grand total of 365 college trustees. Each col¬ 
lege operated as a unit, appealed to the legislatuie for financial 
support in competition with the other colleges, arranged its 
curriculum and its administration as it saw fit, and ordered its 
aifairs to please itself. The older and stronger of the colleges 
used as their chief defensive weapon a policy of paring down 
or reducing in value the credits earned at the younger and 
weaker colleges, thus discouraging enrollment at these institu¬ 
tions and exacting tribute of students who transferred from 
them. 

In one stroke the Reorganization Act of 1931 swept this 
scramble into the discard. Ten colleges were abolished,^ a 
single Board of Regents replaced the 365 local trustees, and a 
chancellor was set up as chief administrative officer. In the 
Chancellor and the Board of Regents was vested the authority 

VFhe colleges siuviving the leorganlzatlon weie. The University ot 
Geoigia with its School of Medicine, The Geoigia School of Technology, two 
semoi colleges for women, one college for teachers, seven junior colleges, and 
three colleges for Negroes The average annual enrollment in regular session 
is about 12,000 students. 


233 



KDl'CATIONAL AND PSYCHOLOGICAL MEASUREMENT 


for sotting the cthicutiunal and financial policies of the system 
of colleges and for reviewing the activities of colleges indi¬ 
vidually, as they might add to or detract fiom the effectiveness 
of SCI vice to the state. 

As part of the renrgani/.ation it was recommended that 
“at an early date there should be added to the Chancellor’s 
office an , . . officer properly trained in educational and sta¬ 
tistical techniques [who] should be charged, under the super¬ 
vision of the Chancellor, with the necessary duty of assem¬ 
bling, analyzing, and interpreting the regular and special re¬ 
ports of the operations of the several branches so as to make 
continually available in proper form for the Board of Regents 
that general information and other specific data upon which 
the Board may base its actions.” 

An office for this purpose was established in 1934 by order 
of the Regents and was located at the University,’^ that being 
considered the hub of the academic wheel whose circumference 
is the state system of higher education. The Regents wisely 
provided this new office with the nucleus of a bureau of stand¬ 
ards against which educational accomplishments and experi¬ 
ments could be measured and from which administrative poli¬ 
cies of individual colleges could be, directly or indirectly, 
evaluated. 

This provision included a basic curricular pattern of ten 
courses representative of general education, which was re¬ 
quired in all the colleges, the content of the courses having 
been determined by the faculties of the colleges in a series of 
conferences. 

To provide for the effective administration of these 
courses, information for their frequent revision, and a guar¬ 
antee that equal achievement on the part of students regard¬ 
less of college should be given equal credit, with right of trans¬ 
fer of credits without let or hindrance, the Regents authorized 
state-wide examinations on these courses and common inter¬ 
pretation of scores made by students taking them. 

^The University ia in Athens; the Chancellor’s office is in the State Capitol, 
Atlanta. 


234 



examiners ofhc'I, or (,ior,.ia sysekm 

Aj ■ 'cti-afive respoiibiliility lor th:s policy was assigned 
f BTi" tk .« "«-■ i' '-...I M., up, Wliich l,.,cr, by 
\mUcceptancc, came to Im known as the “I-,xamincis Of- 
^ Supervising ami adminislermj' t nurse examinations, 
^ intended to be no more than partial bases of 


lice, 

however, were 


n(JWCV<^r, ^ . 1 . 

operations for mme important i 


Inties and obligations of the 


office 


The pilmary bureau of standanls. umsisling of 1(1 survey 
couises generally iciiuired. was angmented by amhoffity to 
k use of a variety of devices for gauging the eltecUveness 
r/the m-ogrm, among them measures of the relative quality 
of students electing to attend and those not electing to attend 
college together with ways <.l improNing selection; analysis 
of the physical well-being of students and its relation to men¬ 
tal acuity evaluation of the amount of cultural background, 
skill and ’intellectual power that the college environment pro- 
vidp’s with appllcation,s to the indi%idual problems of students 

From the general framework ami tfchimiues of analysis 
that are employed in the attack upon these problems have 
come numcious adaptations tliat provide partial, and often 
rather full, answers to such questions as the lelalive cost to 
the state of general and special cdiuatiom the relative ef¬ 
fectiveness of each as judged by administrators, ns observed 
in student opinion and c\perienee. ami :ts measured against 
outside ciiterla; optimum si/e of class enrollments; whether 
education conceived ami practiced as a purely personal matter 
between instructor and student temls in crystah/.e and crum¬ 
ble more or less rapidly than when it i,s broadly administered 
and variously supervised as. for example, under central as 
against local administration, or under divisional as against 
departmental jurisdiction Nearly all of tliesc problems le- 
volve on the one hand around individual college prerogatives, 
and on the other around obligations of I he eollei/es, collect¬ 
ively, to the state. 

How successful has this venture toward a university system 
proved itself to he? 


235 



KDUCATIONAI, AND I’SVCriOI.OGICAL MEASUREMENT 


It IS pcrh:ii)s ttm curly, after the lapse of approximately 
a decade only, to judge faiily whether the experimental at¬ 
tempt to establish the University System and maintain it 
through a research program will ultimately make a positive 
contribution to policy in higher education generally, as the 
lounding of the first State University did so well over a cen¬ 
tury and a half ago. 

Uf the successes or failures, those in administration are, of 
course, the most difficult to appraise. Any issues that point 
up a central administration as superseding local or college 
jurisdictions are hound to generate an appreciable amount of 
heat and to invite emphatic assertion of “states’ rights." 
Hence, it is to be expected that patterns of incandescent light 
will flicker fiequently, as they have, among the colleges of 
Georgia and will wax in strength within the faculties of indi¬ 
vidual colleges and departments as well. Attention must thus 
inevitably be divided, often on very plausible and sincerely 
held grounds, between these things and many admirable ac¬ 
complishments of cential administration such as are annually 
cited by the Chancellor in his report to the Governor. 

But in the less controversial sphere of service and research, 
it can be claimed with confidence that many of the techniques 
employed in the experiment toward a university system have 
not only served their Immediate purpose well but also have 
proved useful in helping to set administrative policies. 

Course examinations in the basic curriculum have been 
the key to assembling data on the effectiveness of the educa¬ 
tional program. These courses are set up on a five-hour, 
quartei'ly plan There are three in the social studies, two 
each in the biological and physical sciences, two in humanities, 
and one in mathematics. Two in elementary English were 
added to this group for examination purposes in 1936; and 
two In chemistry In 1940 Since the war emergency, addi¬ 
tional courses in mathematics and physics are also being in¬ 
cluded for the men students. 

Quarterly from 500 to 2,500 person-examinations are ad¬ 
ministered in each of the basic courses, with an average per 


236 



EXAMINERS OFFICE OF GEORGIA SYSTEM 


course approximating 1,000. Over the period of a year about 
200 teachers take part in the instructional and examining 
programs. 

Administrative procednies for these programs are de¬ 
signed to furnish a framework within which the individual 
talents of teachers not only may be protected but also may be 
given direction. These procedui-es may be summarized as 
follows: 

1. Conferences and committees to formulate the aims of 
courses and the content-outline of examinations, and 
to consider the limitations imposed upon the objectives 
of examining by the average student to be served and 
by the extent of variability from this average of other- 
students who likewise are to be judged on examina¬ 
tion results; 

2. Participation by teachers in these general formulations, 
in the types of questions or tasks that will be required 
of persons to be examined, and in the final selection 
of materials to be included in an examination; 

3. Definite and regular assignments In question making 
foi inclusion in examinations, with complete encour¬ 
agement to offer innovations with respect to both form 
and content; and 

4 An office for analysis, research, and collation of the 
examining function with respect to construction, ad¬ 
ministration, and interpretation. 

Individual and group conferences are used extensively for 
the purpose of aiding members of the teaching staff In the 
preparation and improvement of examination questions. Item 
analyses are placed at the disposal of committees and instruc¬ 
tors, and reports, digests, mimeographed material, and the 
like are made available for the information of staff members.^ 

Scaling of examination results is done by the Examine!s 
Office, with the advice of Divisional Heads and after periodic 
canvassing of faculty opinion. Final grades in course work 
are assigned by individual teachers by means of a type of 

SPoi' eximple, F S Beeis and others, Some Principles of Examining, with 
Aids for Consulting Examiners (University of Georgia Press, 1942), 45 pages. 

237 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

transmutation to letter giades of the average ranks on class 
work assembled by tcacheis and on scores on the final examl- 
nation.'* Final grades are comparable from college to college 
and among the basic courses. This feature is imperative if the 
transfer of ciedits and students is to be effected on a legiti¬ 
mate basis; and in <i university system it should take preee- 
dence over the more common practice in colleges of subordi¬ 
nating the examining function to the much vaunted “objectives 
of instruction.” It is apparent, however, that the two points 
of view need not be mutually exclusive. Those who make such 
a claim tend, perhaps, to think more with their bile than with 
their brains. 

Machine scoring of final examinations is done by the staff 
of the Examiners Office. From 9,000 to 13,000 answer sheets 
are scored quarterly in a period not exceeding four days. The 
method used is unique Each college alphabetizes its answer 
sheets immediately after an examination, prepares an alpha¬ 
betical list of student names in duplicate, packages both, and 
expresses or brings the packages to the central office. Here 
the procedure is as follows; 

1. The duplicate list of names is Inserted in a typewriter 
set up adjacent to the scoring machine, and a typist 
is put in charge to record scores; 

2. A tally clerk equipped with printed forms is located 
in front of the scoring machine facing the operator; 

3. The scoring machine operator calls each name and 
score (part or whole as the case may be) but does not 
record it on the answer sheet; 

4. The typist and tally clerk record the scores on their 
respective forms from the “call” of the machine op¬ 
erator ; 

5. A calculating machine operator summates the tallies 
by sections and colleges and runs the scale on the total 
distribution for the State; 

*See F. S Beers and H, M Cox, “Measniement or Maiking?" Journal of 
the American Association of Collegiate Registrars (April, 1938). 

238 



EXAMINERS OFFICE OF GEORGIA SYSTEM 


6. Names, raw 'scores, and the scale for transmuting 
scores into grades are put In an envelope and mailed 
special delivery to each college dean. 

On the average, the total possible score per examination 
is 150 points, although some tests may have a possible total 
of more than double this figure. Rescoring has shown small 
errors, ±1 point, to be characteristic of about S per cent of 
answer sheets. Only very occasionally do large errors occur. 
As a check on these, deans are Instructed to call for rescoring 
whenever a student’s displacement m rank between the ex¬ 
amination score and his class work exceeds one letter grade 
or whenever an instiuctor makes a request for rescoring. 

Item analyses of questions used In the examinations on 
the basic courses prove extremely valuable in the selection of 
items for subsequent inclusion in freshman placement and 
sophomore comprehensive tests. Each year a batteiy of such 
tests is constructed, covering general education. The parts 
are divided so that approximately equal weight is given to 
scientific and verbal skill, roughly paralleling the Q and L 
scores for the American Councd Psychological Examinalton. 
Repeated samples on students taking both the A.C E. and the 
Southeastern Aptitude Examinations yields coefficients of cor¬ 
relation with a median value of .90. 

The Southeastern Aptitude Examinations are constructed 
in April of each year, aie first administered to sophomores as 
"compiehensives,” and in the following fall are given to fresh¬ 
men as placement tests Statistical comparability for succes¬ 
sive editions is based on the assumption that the freshman 
and sophomore populations are substantially the same in abil¬ 
ity and achievement from year to year This assumption is 
checked periodically by means of sampling with the same form 
of the A C.E. 

The framework of placement and end-of-the-year sopho¬ 
more testing supplies a valuable reference for numerous 
studies Relative gains over the first two years in college by 
fields may thus be estimated; and the results, when placed 
at the disposal of committees on course content, have been 


239 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


found useful for revision purposes. The geneial oi com¬ 
posite indices on the tests for fieshmcn supply “expectancies” 
that have been valuable in reforming grading practices in the 
non-basic or pre-professional curricula, in which marking has 
been found to be, on the avciagc, a letter grade higher than 
in the basic couises, besides being for the most pait exticmely 
unieliable. Placement and sophomore test scores likewise 
make possible predictive studies of general ability in relation 
to achievement in the basic courses, where grading is rela¬ 
tively reliable and comparable from college to college The 
part scores as well as the general score index may also be 
used for similar inquiries. 

Coupled with the placement testings are cent! ally admin¬ 
istered physical examinations. Medical stalf officers and med¬ 
ical college seniors give their sei vices for this purpose. The 
examiiicition blank is set up for ITollcrith tabulation and in¬ 
cludes, besides quantification of clinical findings, a socio-eco¬ 
nomic scale and an index of emotional stability. Tabulation 
of the data makes possible, together with the “paper and pen¬ 
cil’’ testing, a variety of studies bearing on the physical and 
mental development of students coming from many different 
types of environment. 

Surveys of student opinion of college work have been 
demonstrated as worth while in shedding light upon the ef¬ 
fectiveness of educational practices and in comparing, from 
this point of view, the lelative quality, difficulty, and popular¬ 
ity of the basic and pre-professional curricula. The setting 
is especially favorable to useful measures of student opinion, 
since approximately half of the curriculum at the junior col¬ 
lege level is composed of basic courses common to all students 
and half of pre-professional or vocational courses." 

All examinations, forms, questionnaires and the like are 
prepared centrally by the photo-offset process. Collectively, 
the examinations of all kinds for a single year approximate 

^''Student Opinion of College Couiseg, 1937 and 1940,” Examineis Office 
Bulletin, Septembei, 1940, Uinversity of Geoigia Piess 

240 



EXAMINERS OFFICE OF GEORGIA SYSTEM 


200,000 copies. About 35 per cent of these are used outside 
of the University System, by colleges and high schools in, the 
Southeast. 

All data from examinations, periodic and occasional re¬ 
ports and studies, and geneial conclusions about educational 
policy growing out of the services and research are made 
available through conferences, correspondence, and formal 
documents to the Chancellor, the Board of Regents, and the 
presidents and faculties of the 15 colleges of the System. A 
University System Council of which the Examiner is executive 
secretary formulates the educational policies for the System 
and recommends its findings to the Chancellor and Board of 
Regents for action. 


241 




LEVELS OF COMPETENCE IN COUNSELING-A 
POST-WAR PROBLEM FOR STUDENT PER- 
SONNEL WORK IN SECONDARY SCHOOLS 

MILTON E. HAHN 
Univeisity of Minnesota 


M ANY THOUGHTFUL secondary school administra¬ 
tors are deeply concerned with the readjustment prob¬ 
lems which will face the United States and its educational in¬ 
stitutions after the present world conflict. The depression 
years of the past decade gave a fore-taste of the services 
which will be demanded of schools and their personnel ivork- 
ers in a post-war world. The inadequacy of student person¬ 
nel work between 1929 and 1940 was brought sharply home 
to our high schools by the creation of new governmental 
agencies which were established to compensate for the short¬ 
comings of public education In many communities the first 
professional guidance services for youth were introduced by 
the National Youth Administration, the Civilian Conserva¬ 
tion Corps, the Work Projects Administration or the Fed¬ 
eral-State Employment Service. The work of these agencies, 
coupled with the relatively careful man-job analyses being 
made by the personnel divisions of the armed services, raises 
a serious question as to whether or not the public will accept 
traditional hit-or-miss methods in preparing post-war youth 
for meeting its responsibilities School administrators face 
conditions which demand constructive action if their Institu¬ 
tions are to retain the high public esteem and financial support 
they have enjoyed in the past. 

Student personnel work is a relatively new educational 
configuration in secondary schools. For two decades begin- 


243 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


nlng about 1909, the major emphasis was upon treating indl- 
viduals and theii particular complex problem patterns with 
group methods, paralleled during the second decade by the use 
of tests in college and industry. The third decade of the move¬ 
ment was devoted to a search for tools and techniques more 
valid and reliable than the lecture and casual confeiences be¬ 
tween a student and a teacher. This decade contributed much 
to the methodology of job analysis. Tt was also marked by 
the flowing together of these two movements With the emei- 
gence of better tools and techniques for man analysis, compe¬ 
tent general counselors began to be trained and utilized In 
colleges, universities, and large secondary schools During 
the 1930-40 decade the leaven of professionally trained stu¬ 
dent personnel workers spiead unevenly over the country Into 
small colleges, junior colleges, and high schools enrolling 
less than SOO students. 

This decade also was marked by attempts to define and de¬ 
scribe student peisonnel work.^ The older teim, guidance, 
had, because of disputes as to its nature, become more and 
more meaningless Various schools of thought stretched 
“guidance” to mean vocational guidance only,“ to be a syn¬ 
onym for education,'' and to cover the ordinary non-lecture 
activities of classroom teachers alone ■* There is no present 
definition of either guidance or personnel work which is gen¬ 
erally accepted by all workers with youth problems. The 
matter of definition is of inteiest to us here only because it is 
necessary to limit the scope of our materials. Personnel woik 
with secondary school students must he broadened In scope to 
include responsibilities in certain directions for out-of-school 

IThe leader inteiested in the development of student personnel work is 
referred to the following souices. W. H. Cowley, “The Nature of Student 
Personnel Work,” The Educational Record, Apiil, 1936; George E Myeis, “The 
Nature and Scope of Peisonnel Work,” The Harvard Educational Revievi, 
January, 1938 ; Donald G Paterson, “The Genegia of Modem Guidance,” The 
Educational Record, January, 1938. 

^H. D. Kitaon, “Getting Rid of a Piece of Educational Rubbish.” Teachers 
College Record, XXXVI (October, 193+} 

sjohn M. Brewer, Education ts Guidance (New York’ Macmillan, 1932) 

V. E Walters, Indtmduaheing Education (New Yoik. John Wiley & Sons, 
1935). 


244 



LEVELS OF COMPETENCE IN COUNSELING 


youth Theiefore the following working definition is offered 
as a frame of reference for this article. 

Personnel wotk with youth is the marshalling, under the 
best obtainable professional leadership, of educational and 
other community resources to aid individual youth, in and out 
of school, to help themselves towaid optimal resolutions of 
mmediate and long-range problems in the various life-prob¬ 
lem areas. 

Because the average community is small and because the 
majority of workers with youth are found in the schools, the 
personnel program will tend to center about the secondary 
school for all community youth. Although this will be a 
typical situation, it will be necessary for the educational per¬ 
sonnel worker employed in the educational system to refer 
many problems to other professional workers in the geo¬ 
graphic or political district. At what point does the person¬ 
nel worker in our schools face the necessity for referral of 
cases? A paitlal answer can be obtained through considera¬ 
tion of arbitrarily selected categories of personnel workers 
and the estimated competence for the average individual in 
each category relative to counseling effectiveness. Such an 
approach requires consideration of many variables and com¬ 
plicates verbal presentation. Again the writer exercises his 
prerogative of being arbitrary and for the sake of simplifica¬ 
tion selects the variables to be introduced We shall consider 
teacher-counselors, vocational specialists, and clinical coun¬ 
selors as the categories of youth personnel workers. Life- 
problem areas will be represented by vocational problems and 
educational problems Levels of case history interpretation 
and use of tools and techniques of the counselor are selected 
as the axes upon which our worker categories and life-prob¬ 
lem areas will be considered. 

Life-Problem Areas 

Personnel work with individuals Is necessary because they 
have problems which they are unable to resolve satisfactorily 
unaided. These problems occur in tangled patterns in which 
it is frequently impossible to separate one general kind of 

245 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


problem from another or show clearly which is cause and 
which is eilcct Because it is impossible adequately to verbal¬ 
ize a whole problem pattern, we must discuss the interrelated 
problems as if they were discrete phenomena. A commonly- 
used categorization of life-problems includes vocational, edu¬ 
cational, personal, health, and financial adjustments. For our 
purposes we consider only the first two. 

Vocational problems are those m which lack of adjustment 
is caused by poor choices of vocational field or level, no choice, 
or uncertainty of choice with the need for competent assur¬ 
ance or advice. Vocational problems may be considered in 
some aspects as phases of educational problems. Although 
vocational prohlems often aie treated as if they were simple 
in nature, they are extremely complicated in many individuals 
Sound vocational counseling lequires that the counselor be 
familiar with the thcoiy and clinical usage of the psychologi¬ 
cal concepts of abilities, aptitudes, and Intelests Because 
of this, reliance upon untrained counselors and self-analysis 
has been discarded by the best practitioners. The case against 
these traditional methods has been stated ad nauscum, but 
these methods are still utilized in many secondaiy school guid¬ 
ance programs. A recent study by Stone” presents further 
reasons for questioning present common treatment of voca¬ 
tional problems in adolescents. 

Student educational problems are those caused by being 
thwarted in whole or In part in the attempt to proceed through 
a training program (usually formal) toward a goal. For 
many youth this goal is occupational in nature As has al¬ 
ready been said, vocational and educational problems are very 
frequently different aspects of the same general condition 
Educational problems like others range from the very simple, 
such as a choice between afternoon or morning classes, to very 
complex, such as a complicated reading disability requiring 
special remedial woik. If possible, we have placed even 

Be Ilai'Dld Stone, “Evaluation Progiam m Vocational Oiientation ” Studies 
in Btglier Educalien, Biennial Report of the Committee on Educational Re¬ 
search (Minneapolis University of Minnesota Piess, 1938-1940), pp. 131-145. 

246 



LEVELS OF COMPETENCE IN COUNSELING 


greater reliance upon self-analysis for resolution of educa¬ 
tional problems than has been true of other kinds of problems. 
The tragic results of past treatment of the educational prob¬ 
lems of youth fill the literature. A pointed commentary on 
our educational counseling is found in the New York Regents 
Study.“ 

The Teachers’ Level of Counseling Competence 

In the student personnel programs of many secondary 
schools the teacher is the personnel worker. There are a 
number of reasons why this condition exists. The most im¬ 
portant factor contributing to such programs is the concept 
of guidance held by so many secondary school administrators. 
To believe that teachers trained chiefly for classroom teach¬ 
ing can deal adequately with the serious problems of youth 
implies that the follower of this creed also believes that: 

Student self-analysis has high validity and reliability. 
Student problems are seldom serious. 

Professional workers are not needed. 

Teachers have enough free time to know each student 
intimately and discharge counseling responsibilities. 
Tools and techniques beyond interviews and school 
grades and their intci'pretation are not worth employ¬ 
ing or are quickly learned by classroom teachers. 

The weaknesses of the teacher-counselor type of program 
in which many or all teachers consult on all kinds of student 
problems are manifold. A chief weakness of this type of 
program is the narrow range in which counseling competence 
exists. 

The outline illustrates this narrow range of counseling 
competence. It presents crude continua of data interpretation 
for two problem categories—vocational and educational. It is 
relatively safe to assume that in neither of these continua does 

®Francis T. Spaulding, HigA School a7id Life, The Regents’ Inquiry (New 
Yoilc; McGraw-Hill, 1938}. 


247 



EDUCATIDNAL AND PSYCIinLOGIC'AL MEASUREMENT 


the classroom teacher compaie favorably with counselors 
trained to inteiprct complex interrelated data. 


OUTLINE OF 
LEVELS OF COUNSELING 


iviiKi'Ki.rAiinN I'o sruni'NTsoi i'ducationai. wioiilpms 


UiuMlidatetl 

Peisonal 

IminesVions 


(iiiules- 

K;uiii(!,s- 

SUidant 

Choices 


Simple 
Statistical 
Tieatment of 
Mcaauiement 


Sophisticated Pattern 

Statistical Analysis of 

Treatment of Individual 

Measurement from 

Synthesized 

Data 


INTLIIPRETATION TO STUDENTS OF VOCATIONAI PUODIEMS 

Uiivalidatcd Occupational Relation Occiipa- Simple Sophisticated Pattern 

Student Application of of School tioral Statistical Statistical Analysis ot 

Choices Specific Suh- Subjects to Infoi- Tieatment Tieatment Individual 

jecl Matter Wotldof mation ofMeas- ofMeas- from 

Woik uiement uieraent Synthesized 

Data 

Teacher Coimscling and Educational Problems .—In 
the nrea of educational problems the average teacher works 
for the most part with unvalidatcd student statements or with 
equally invalid personal impressions. Many teachers interpret 
their grades and ratings to youths seeking counsel. A few 
teachers have become skilled to an extent that they can Inter¬ 
pret various kinds of data in terms of simple statistical con¬ 
cepts. A very few are competent to interpret complex, related 
data, dependent for its meaning upon great statistical sophisti¬ 
cation. A rare individual can utilize job and man analyses 
in such a way that proper interpretation is supplied the coun- 
selee. Ineffective educational counseling by secondary school 
teachers is not a matter of speculation or assumption. Eckert 
and Marshall state that moie than three of five high-school 
students in New York State leave school before graduation 
Many of those leaving school do so because of inability to 
meet the demands of the curricula in which they attempt to 
compete Many of these students could profit from courses 
of study different from the ones in which they failed, 


iRiith E. Eckert and Thomas 0 Marshall, IV/ien Youth Leans School, 
The Regents’ Inquiry (New Yoik. McGraw-Hill, 1938), pp 48-49, 

248 



LEVELS OF COMPlirKNCE IN COUNSELING 


Edgerton and Toops" estimate that only 34 per cent of 
1,958 students included in a survey of Ohio college students 
bad records indicating ability for eventual college graduation. 
Many of these students must have been advised by high-school 
teachers and administrators to attempt higher education at the 
college level The literature of secondary school and college 
mortality points clearly to a large amount of poor advising 
about educational problems Williamson” offers a satisfac¬ 
tory summarization of educational counseling by teachers 
when he says: 

Fewer students would select inappropriate courses if re¬ 
liable statements of requirements of the wide variety of occu¬ 
pations and professions open to high-school graduates were 
available. . The counseling use of such information would 
enable more students to prepare for appropriate occupational 
goals. 

Many scholastic failures could be avoided if administrators 
and teachers would establish comparable and valid standards, 
so that students and counselors could better judge of future 
success in a course by past achievements in a related course. 

Teacher Coiin^chny and Vocational Problems .—Because 
educational and vocational problems of youth are so closely 
related, much that has been said of teacher counseling and 
educational problems can also be said of vocational problems. 
The effects of poor vocational counseling upon the boy or girl 
are often more serious than those of poor educational coun¬ 
seling If a student takes a course for which he is not suited, 
adjustments can be made through failure or change of course 
These adjustments do not require long time spans. Poor 
vocational counseling can result in situations where the unad¬ 
justed individual may be forced by circumstances to spend 
long periods In which change is difficult or impossible. One 
has only to inspect the occupational choices of high-school 

®Haiolfl A, Edgeiton and Ileibcit A Toops, "Academic Progress,” Con- 
liihutions ni Jdminislrnlion 1 (Columbus, Ohio; Ohio State Univeisity Piess, 
1929), p, 136 

9E G Williamson, How io Counsel Students (New Yoik McGraw-Hill, 
1939), pp 260-261 


249 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


seniors to realize the unrealistic thinking which can supply 
fertile ground for poor advising. Stone^“ demonstrates that 
in one fieshman group at the University of Minnesota only 
40 per cent of the students had occupational goals which were 
judged valid by competent clinical psychologists A goodly 
proportion of these students came from high schools which 
have prided themselves upon their teacher counseling pro¬ 
grams over a long number of years Students from these 
schools had no better choices than those from schools which 
made no pretensions to vocational counseling Stone’s study 
indicates that counseling by professionally trained, clinical 
counselors reduces the number of poor vocational choices and 
Increases the number of good choices. The gains are sta¬ 
tistically significant. Williamson and BordiiW found that 
control-group (uncounseled) college students achieved a voca¬ 
tional adjustment judged to be satisfactory by themselves and 
the evaluating judges in 68 per cent of the cases On the other 
hand, such an adjustment was achieved by 81 per cent of the 
cases in the experimental group (counseled by clinical coun¬ 
selors). Satisfactory adjustment was not made by 27 per 
cent of the control group and 15 per cent of the experimental 
group. These differences are statistically significant. 

Large numbers of high schools depend upon classroom 
teaching of occupational information to resolve vocational 
problems of students. This faith in “talking at” students has 
little to recommend it. Many of the studies which show ad¬ 
vantage to classroom group-counseling do so only in terras of 
gains in amount of occupational information. No one has 
produced evidence that students with the greatest amount of 
occupational Information make the best vocational choices. It 
is obvious that a student who takes a course in any field of 
knowledge should know more about it than the student who 
has not had the same or similar courses. 

Many useful tools and techniques of counseling have been 

Me Harold Stone, op cll 

HE G. Williamson and E S. Bordin, “Evaluating Counseling by Means 
of a Colitiol-group Experimenf,’’ School and Society, LII (1940), 434-440 

250 



LEVELS OF COMPETENCE IN COUNSELING 


originated or improved in the past decade. Teacher-coun¬ 
selors are seldom trained to collect and collate data originat¬ 
ing from these instruments and methods. They cannot be 
expected to be both teachers and applied psychologists. If 
teachers become clinical counselors, they no longer are class¬ 
room teachers. We need not expect adequate counseling in 
legard to the vocational problems of youth until our schools 
make use of persons other than classroom teachers to assume 
and discharge at least supervisory responsibilities for coun¬ 
seling As -will be stated later, this does not mean that each 
small school unit must or should have a professionally trained 
counselor or close up shop 

Referral to the outline on page 248 indicates that rela¬ 
tively few teachers are so trained that they can interpret data 
to students adequately if such data involve more than the 
presentation of information of a simple nature Sound job 
analysis by teachers offers serious difficulties. Valid man anal¬ 
ysis is beyond the ken of the average teacher 

The Vocational Speciaksl’s Level of Counseling Competence 

The vocational specialist appeared on the personnel work 
scene in the second decade of the developing secondary school 
personnel work movement. A growing public sense of need 
to meet the pressing problems of youth forced educators to 
take cognizance of these problems. The movement was first 
directed toward emphasis upon “things to be done" rather 
than toward “men and women who do things"—job analysis, 
not man analysis. This trend was clearly reflected in the 
proposed qualifications for counselors which appeared in the 
literature of that period. Myers,for example, wrote; 

It is well to remind ourselves, however, that among the 
qualifications, aside from special training, which those who 
select counselors often emphasize are • (1) a personality 
which attracts and gets on well with adolescents; (2) suffi¬ 
cient maturity to command the respect of pupils and fellow 

l^George E. Myers, “A Training Piogiam for Counselois,” Vocational 
Guidance Magazine, p. 315 V (1927). 

251 



RDITCAriONAL AND [‘SYniOLOGICAL MEASUREMENT 


teachers; (3) at least af good a general education as is pos¬ 
sessed hy the average high school teacher^ usually represented 
by graduation from a college in goad standing, (4) success¬ 
ful expeiiencc as a teacher; and (5) preferably, iome business 
or industrial cxpeiience. (Italics not hi original.) 

The Committee on Standard Certification of Vocational 
Counselors, a committee of the National Vocational Guidance 
Association, advocated the following college courses for per¬ 
manent certification d'' 

1. General courses—the usual coiuses required of candi¬ 
dates for teaching certificates: educational psychology, prin¬ 
ciples of teaching, educational measurements, sociology, eco¬ 
nomics. 

2. Related courses — principles and problems of voca¬ 
tional cduiation, industrial history, labor problems. 

3. Guidance courses — piinciplcs and problems of voca¬ 
tional guidance, analysis of vocational activities, methods of 
imparting occnpalianal infoimaiion, psychological tests m 
guidance, counseling the individual, placement and follow-up, 
and field work in guidance. 

Inspection of these tiaining programs indicates clearly the 
emphasis placed upon the job and the worker’s relationships 
to it. Adequate tools and techniques had not as yet been dis¬ 
covered to analyze and treat the individual as some one to 
whom a particular set of duties could be fitted. Practice was 
to treat the job as a set of duties to which a man or woman 
must be fitted Vocational specialists dealt with vocational 
problems, i e , job specifications, occupational trends, labor 
problems, how to get a job, placement, and follow-up. Voca¬ 
tional aspects of total adjustment were considered so impor¬ 
tant that other aspects of human adjustment were often over¬ 
looked or left to be treated by specialists not yet found in 
secondary schools. 

The unfortunate feature of the era of vocational special¬ 
ists is not that personnel work passed through the stage, but 
rather that the stage has persisted. Too many vocational 

l^Leonaid V. Koos, and Gvayson N. Kefanvei, Guidance in Secondary 
Schools (New York The Macmillan Company, 1932), pp. 569-573. 

252 



LEVELS OF COMPETENCE IN COUNSELING 


specialists continue to think of personnel work as job descrip¬ 
tion and placement long after educational leaders have rele¬ 
gated their particular contribution to an important but sub¬ 
sidiary position in the field. 

Vocational interests are no longer what boys and girls say 
they want to do (usually stated as a job label). Vocational 
interests are analyzed today by considering youths’ claims in 
the light of observed behavior over a period of time and 
the leads furnished by vaiious technical psychological meas¬ 
uring instruments. Selection of a career is no longer in terms 
of whether or not an individual can do a job. Rather the 
question is raised regarding what family’^'* of jobs and at what 
level within this family the optimal vocational adjustment will 
tend to occur. We are not so prone to encourage an ado¬ 
lescent in a secondary school to be a doctor of medicine. We 
tend to direct to “scientific fields at the professional level.’’ 

Evidence of failure to meet the vocational problems of 
youth through the services of vocational specialists is abun¬ 
dant in the literature. Anderson^'* made a strong case for 
psychiatric services in industry. The conditions cited by An¬ 
derson raise questions as to the ways in which the men and 
women studied were guided to their occupational niches. 
Fisher and Hanna^® also contributed evidence that an alarm¬ 
ing number of workers were not being directed or helped into 
suitable careers. There is little evidence to show that the 
worker who had the help of the vocational specialist made 
significantly better occupational choices than his non-counseled 
brother. 

Reference to the outline leads one to suspect that the vo¬ 
cational specialist has been handicapped m his work by his 
genera] inability to deal with man analysis His contribution 
to personnel work, however, has not been small The shift 

l^Donald G. Paterson, Clayton d'A. Geiken, and M. E Hahn, Minnesota 
Occupational Rating Scales (Chicago. Science Research Associates, 1941). 

i^V V. Andeison, Psychiatry in Industry (New York' Harpei and 
Blethers, 1929) 

1-®V. E. Fisher and Joseph V. Hanna, The Dissatisfied Worker (New York 
Macmillan, 1931), 


253 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

to emphasis upon men rather than jobs was hastened appre¬ 
ciably by his ehorts. Nevertheless, general responsibility for 
counsellng of youth can hardly be delegated to the vocational 
specialist. Except for a minor specialty, he is no more compe¬ 
tent than the teacher-counselor. 

The Cluiical Counselor's Level of Counseling Competence 

The clinical counselor is a highly trained, widely experi¬ 
enced, applied psychologist His training has been directed 
primarily to an understanding of people both as unique indi¬ 
viduals and as members of various groups. He is a specialist 
in one or more areas of human problems. At the same time 
he Is a generalist, albeit with knowledge of his limitations 
The place of the clinical counselor in the field of personnel 
work is well stated by Williamson” when he says: 

While clinical counseling is only one of several specialized 
fields dealing with peisonal problems, we maintain, however, 
that it is the basic type of personnel work with individual stu¬ 
dents and serves to coordinate and focus the findings and 
efforts of other types of worker's. 

Paterson, Schneidler, and Williamson^® contend that per¬ 
sons training to be clinical counselors should complete the 
Master’s degree in psychometrics or its equivalent. Further 
they consider the Ph.D degree or its equivalent in techno¬ 
logical psychology as desirable The counselor so trained 
should be a master of the tools and techniques that fill the 
competent counselor’s kit. The clinical counselor should be 
competent in the full range of data interpretation found In 
the outline. 

It is not our present purpose to analyze the competency 
and functions of the counselor. It is safe to assume that such 
a one is, at the present time, our best trained personnel worker 

DE. G Williamaon, How to Counsel Students (New Yoik: McGraw-Hill, 
1939), p. 36 

i^Donald G. Paterson, G Schneidlei, and E G Williamson, Student Gtiid- 
ance Techniques (New York: McGraw-Hill, 1938), pp. 302-303. 

254 



LEVELS OF COMPETENCE IN COUNSELING 


with the vocational-educational problems of youth. Evidence 
has been marshalled which Indicates that counselors at this 
high level of competence do achieve appieciably better out¬ 
comes of counseling than is true of other individuals working 
with vocational-educational problems of youth. It is our pur¬ 
pose to urge that we make use of these people even in small 
school systems which cannot add them to their full-time staffs. 

The average American high school is small. Sound school- 
community personnel programs must include the pooling of 
resources in order that these small schools can have many 
kinds of services which they could not afford alone Adminis¬ 
trators in small secondary schools will find that complete youth 
personnel programs are beyond them when they consider only 
the community which they serve When consideration is given 
to the combined resources of five, six, or seven schools, there 
are practical solutions to the problem Sharing the services of 
clinical counselors is such a solution When an administrator 
faces an in-service training progiam for teacher-counselors 
with no professional assistance, he is Involved in difficulties. 
When he faces in-service training of teacher-counselors as part 
of a county or district project in charge of a competent in¬ 
structor, many of these difficulties disappear 

Few small communities can supply enough personnel work 
with youth to occupy the full time of a clinical counselor who 
concentrates upon vocational-educational problems. Part-time 
aid from such a counseloi will, in many instances, be all that is 
needed to develop the school-community personnel program. 
A qualified counselor can discharge personnel functions in sev¬ 
eral schools For example, research on problems common to 
several schools may be conducted almost as easily as for a 
single institution In-service tiaining in a district may be little 
more difficult than in a single institution. Counseling of diffi¬ 
cult cases in a district may involve no greater case load than 
would be true in a single large institution or community. De¬ 
velopment of several sound school-community programs at 
one time in cooperation with other personnel agencies is not an 
unreasonable task 


255 



LDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


We have had time to discover the gross eirors in assigning 
major counseling responsibilities to teachers and vocational 
specialists We have not yet had time to correct these errors. 
Personnel work with youth in the post-war pn lod will go for¬ 
ward, although there is no guarantee that the public schools 
will retain the golden oppoitrinity they have had to develop 
the field. If secondary school administrators will forget tradi¬ 
tion and face the tasks ahead realistically, if they will turn 
from subject matter and “things for people to do," if they 
will make use of the best personnel workers, they may yet 
retrieve the losses in public support and esteem which they 
suffered in the depression and war years. Much depends upon 
the level of counseling effectiveness which the schools achieve, 


256 



A STUDY OF SOME LOCAL FACTORS AFFECTING 
STUDENTS’ SCORES ON THE MINNESOTA 
PERSONALITY SCALE 

BETTY M HORNE and W C. McCALL 
Univeisity of South Carolina 


A FRESHMAN TESTING pt-ogram including tests of 
geneial scholastic aptitude as well as tests of achieve¬ 
ment and aptitude in specific subject fields has become com¬ 
mon practice in our colleges and universities The results of 
these tests are customarily used in counseling with the student 
on academic and vocational problems, and in predicting the 
student’s probable academic success. 

Many institutions also include In their testing program 
instruments which are designed to measure the student’s vo¬ 
cational interests and aptitudes and scales designed to evalu¬ 
ate his personality adjustment. Both the felt need for infor¬ 
mation of this nature as an aid to more effective guidance 
procedures and the increased reliability and validity of recent 
measniing instruments have undoubtedly influenced this trend. 

However, the effect of local factors on all test scores and 
more particularly on tests of personality has long been recog¬ 
nized. The present study reports a systematic attempt to ana¬ 
lyze these factors in a specific situation. In September, 1941, 
the Minnesota Personality Scale was added to the regular 
list of freshman tests at the University of South Carolina. 
The test was thus administered to 241 freshman men and 
144 freshman women. 

Two different forms of the tests have been published, one 
for men and one for women. The total scale consists of five 
sub-scales measuring, respectively, Morale, Social Adjustment, 

257 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

Family Relations, Emotionality, and Economic Conservatism. 
The authors of the test, Darley and McNamara, describe 
these sub-scales as follows: 

Part I—Morale: Fligh scores are indicative of belief in 
society’s institutions and future possibilities. Low scores 
usually indicate cynicism or lack of hope in the future, 
Part II—Social Adjustment; Fligh scores tend to be char¬ 
acteristic of the gregarious, socially matuie individual in 
relations with other people. Low scores are characteristic 
of the socially inept or undersocialized individual. 

Part III—Family Relations; High scores usually signify 
friendly and healthy parent-child relations. Low scores 
suggest conflicts or maladjustments m parent-child re¬ 
lations. 

Part IV—Emotionality: ITigh scores are representative of 
emotionally stable and self-possessed individuals. Low 
scores may result from anxiety states or over-reactive 
tendencies. 

Part V — Economic Conservatism: Fligh scores indicate 
conservative economic attitudes. Low scores reveal a 
tendency toward liberal or radical points of view on cur¬ 
rent economic and industrial problems, 

As is customary at the Lfniversity of South Carolina, 
norms for the local population were set up and have been 
used in rating all students. A comparison of these norms 
with those published for the University of Minnesota popula¬ 
tion on which the test was standardized comprise the first re¬ 
sults of this study, and are presented in Table 1. 

Comparison of Minnesota and South Carolina Scores 
Since the Minnesota norms are stated in terms of per¬ 
centile values, It was necessary to base all statistical calcula¬ 
tions on these values. Accordingly, the critical ratios are ex¬ 
pressed in terms of the difference between fiftieth percentile 
points divided by the P.E. of the difference between these me¬ 
dians and must equal 4.0 or more in order to be statistically 
significant. Since there are different forms of the test for 
men and women, the critical ratios for the two sexes have been 
calculated separately. 


258 



LOCAL FACTORS AFFECTING SCORES 


TABLE 1 

COMPARISON OF SCORES OF MINNESOTA STUDENTS AND SOUTH CARO¬ 
LINA STUDENTS ON THE MINNESOTA PERSONALITY SCALE 


Sub-Scales 


Median Raw Score 

I 

II 

III 

IV 

V 

Minn, men (N= 1083). 

167 

224 

138 

159 

106 

S. C men (N = 241). . 

172 

230 

149 

163 

105 

Critical ratio* . 

6.5 

3.3** 

11,3 

3.6 

1. 

Minn, women (N = 888) 

173 

228 

149 

168 

104 

8. C. women (N = 144) 

178 

237 

158 

170 

104 

Critical ratio*. 

5.2 

4.7** 

78 

1.1 

0 


‘"Cutical latio = Diffeience between medians divided by probable enor of 
this difference 

**Cntical latio of difference between Minnesota median and South Carolina 
mean for women =2.1, for men = 2 2. 

The table indicates two areas in which the South Carolina 
students appeal to be better adjusted than the Minnesota stu¬ 
dents, Morale and Family Relations. The critical ratios for 
these scales are significant for both men and women Data to 
be piesented below indicate that the latter difference, that in 
Family Relations, is probably related to the relatively smaller 
number of large centers of population in South Carolina than 
in Minnesota. 

The explanation for the difference in general morale is 
less apparent The items in this sub-scale may very roughly 
be divided into three main groups, namely, questions concern¬ 
ing faith in the honesty and adequacy of our legal system, 
questions dealing with faith in the value and methods of our 
educational system, and faith in the possibilities which the 
future holds for the individual. It must be emphasized that 
this division is both arbitrary and rough, and since no data 
were available for an item analysis, no comparisons within 
this sub-scale are possible. A generalized statement based 
on the authors’ description of the area of personality adjust¬ 
ment measured by this scale would indicate that the South 
Carolina students had significantly more faith in society and 

259 



EDUCATIONAL AND I’SYCIIOLOGICAL MEASUREMENT 

its institutions ;ind in their own future than did the Minnesota 
students. 

Scores of entering South Carolina freshmen on the Test 
of Gcneial P)ofici(;ncy ui the Field of the Sodal Studies of the 
Cooperative General Jchieveiiieiil Tests indicate that their 
acquaintance with the strengths and weaknesses of oui Ameii- 
can social institutions is more limited than that of the 6296 
freshmen on whom the test was standardized The South 
Carolina mean fell at the thirty-fifth percentile point for the 
standardization group. It is possible that this lack of ac¬ 
quaintance has tended to promote an unciitical acceptance of 
these institutions It is not Improbable that this difference, 
also, is related to the factor of population distribution, al¬ 
though the subsequent data do not strongly suggest such a 
conclusion. 

The scores on Scale IT would suggest that the South Caio- 
lina students are more interested in and better adjusted to 
social group life. The nature of the questions on this scale 
stiongly suggests that It measures a factor closely resembling 
the usual definitions of introversion-extroversion. Although 
the foregoing statement may be interpreted ns lending support 
to the tradition of hospitality and sociability of the Southern 
home, the statistics also offer an alternative explanation. 

The distributions of scores on this particular sub-scale 
seem to be somewhat skewed, since the median women’s score 
is S points higher than the mean and the median men’s score 
is 2 points higher than the mean. The disci epancy of 5 points 
in the women’s distribution is the greatest diffeience between 
median and mean In any of the 10 distributions, the two dif¬ 
ferences on Scale III being 2 5 points, and all others being 
2 points or less It Is the only instance in which such dis¬ 
crepancy affects the significance of the difference between the 
scores of Minnesota and South Carolina students. As the 
footnote to the table indicates, the difference between the 
Minnesota median and the South Carolina mean for women 
on Scale II, divided by the probable error of the difference 
between the two medians, is only 2.1, a critical ratio which 


260 



LOCAL FACTOES AFFECTING SCOEES 


IS not Significant; the coiTesponding i-atio for men is 2 2. The 
reader may choose the explanation of these data which seems 
to him most logical and acceptable 

Relation of Scores to Size of Home Town 

We have referred above to an analysis of the data from 
South Carolina students based on size of population centers. 
The University is located in Columbia, the capital of the 
state and a city of about 75,000. Thirty-five to forty per 
cent of the students are from Columbia. The comparison of 
adjustment scores for students from population centers of dif¬ 
ferent sizes was originally suggested by a clinical observation 
that there seemed to be a disproportionate number of students 
from Columbia who showed one single low score on Scale 
III, Family Relations. 

The complete set of data is presented in Table 2. The 
students weie divided into three groups, according to their 
home residence and the size of the town in which they at¬ 
tended high school. Class A includes all students who re¬ 
sided in and attended high school in cities of 25,000 and 
over; Class B includes those who resided in and attended 
high school in towns of 2500 to 25,000; and Class C those 

TABLE 2 

COMPARISON OF SCORES OF SOUTH CAROLINA STUDENTS FROM TOWNS 
OF CLASS A, CLASS B, AND CLASS c"** 

Sub-Scales 


Mean T-score value 

I 

II 

III 

IV 

V 

Class A (N = 183). . 

. 49 

47 

44 

48 

47 

Class C (N== 80) . . 

. 51 

50 

47 

48 

49 

Class B (N = 91).. 

. 52 

56 

55 

54 

52 

Critical ratios’*'^'' 

Class A vs. Class B 

. 1.0 

3.6 

43 

23 

2.1 

Class B vs. Class C, ,. 

0.3 

22 

2.8 

2.1 

1.1 

Class A vs. Class C 

, 0.6 

1.1 

0.9 

02 

0.6 


*Class A—Towns with a population of 23,000 and oyei. 

Class B—^Towns with papulation of 2500 to 25,000. 

Class C—^Towns with population of less than 2500 and ruial districts. 
**Cntical latio = Diffeience between means divided by the standaid error 
of this difference. 


261 



BDUC/VnONAI. AND PSYCHOLOGICAL MEASUREMENT 

who attended high school in a town of 2500 or less and re¬ 
sided in a town of that sixe or gave a rural address. 

It should be noted that Class B was originally divided 
into groups fiom towns of 2500 to 10,000 and from towns 
of 10,000 to 25,000. Difterences in average score between 
these ttvo groups were genci ally small, the group from towns 
of 10,000 to 25,000 being in most instances somewhat lowei. 
Mowever, the number of students falling in the group from 
10,000 to 25,000 was so small as to make any real conclusions 
impossible. The present Class B is composed of 68 students 
from towns of 2500 to 10,000 and 23 students from towns 
of 10,000 to 25,000. 

TABLE 3 

MEASUllES OF VAIlIAlllLlTY FOR AVERAGES PRESENTED IN 



TATU.ES 1 AND 

2 






Sub-Scales 



Median Raw 






Scene ± P.E 

Minn men 

I 

11 

III 

IV 

V 

(N-.= 1083) .,. 

lC7rt9 

2244;21.5 

138±12S 

159±12 

106±8 

S C, men 

(N=241) , . . 

173-1-8.5 

230-1-20 

149-1-10 5 

163±12.5 

105±S.S 

Minn women 






(N=8S8), 

S C. women 

, 173^9 

228 ±18 

149±14 

168±16 

104±6 

(N=144) . , . 

17S-I-8 5 

237-1-17 

158±9 5 

170±16.5 

104±5 

Mean T-scoi e 4- S D 






Class A (M—183). 

49-1-21 

47-1-22 

44-1-21 

484-20 

474-19 

Class C (N~-S0) 

51-1-22 

50-1-19 

47-1-20 

484-19 

494-20 

Class B (N=:91).. 

52 1-18 

56-1-19 

55-1-19 

544-19 

524-16 


An explanation of certain statistical techniques used in 
the calculations of Table 2 is necessary All individual scores 
for South Carolina freshmen were translated into T-score 
values determined from the means and standard deviations 
of the distributions of raw scores. The scores presented are, 
therefore, average T-score values, not average raw score 
values. The use of these T-score values provides a basis on 
which scores for men and women may be combined for statis¬ 
tical treatment. The T-scores for each sex are based on the 
scores of that sex, but a given T-score value for men and 
women is considered comparable. 

For convenience in reading, Class C is shown between 
Class A and Class B in Table 2. The evidence provided by 


262 



LOCAL FACTORS AFFECTING SCORES 


the table is both striking and self-explanatory. On every scale 
the average scores of students gradually increased from Class 
A through Class C to Class B. At least for South Carolina 
high school students the environment most conducive to per¬ 
sonality adjustment seems to be a Class B community; that is, 
a town from 2500 to 25,000. Very small towns or rural dis¬ 
tricts appear somewhat more favorable than metropolitan 
centers. 

Critical ratios presented in this table are based on the 
difference between means divided by the standard error of this 
difference, and are thus significant at 3.0. The differences 
between Class A and Class B students on Scales II and III, 
Social Adjustment and Family Relations, meet this criterion; 
and the differences between Class B and Class C show 98.6 
and 99.7 chances m 100, respectively, of a true difference on 
these same scales. 

It seems safe to conclude, therefore, that towns of medium 
size provide a significantly better background for the develop¬ 
ment of social maturity and extroveitive social relationships 
than either a city or a very small community Speculation 
concerning the complex factors operating to produce these 
differences would be interesting but highly subjective. 

At least one factor affecting the home and family adjust¬ 
ments would seem, however, to be less complicated One 
large group of questions contained in this sub-scale relates 
to possible maladjustments arising from the young person’s 
efforts to establish his social and personal independence. It 
seems logical to suppose that these areas of family relation¬ 
ship are subject to greater strain in either a metropolitan cen¬ 
ter or a small town than in a medium-sized urban community. 
The “temptations" of city life have been discussed perhaps 
far too much in our recent sociological literature, but the op¬ 
portunity and motivation toward social independence offered 
by the recreational, social, and even school-sponsored activi¬ 
ties of a city are too obvious to require elaboration. Any 
effort of the parents to counteract these influences will almost 
inevitably lead to family conflict. 

263 



EDUCATIONAL AND I'SVCIIOLOGICAL MEASUREMENT 

In the i-urjil home, on the other hand, it seems quite prob¬ 
able that the source of conllict lies in the moral conservatism 
which is generally assumed to be characteristic of farm pai- 
ents. We are heic assuming, of course, that towns under 
2500 resemble rural areas in their mores and attitudes, an 
assumption which Is probably justified If this is the case, 
rural and small town students undoubtedly find themselves in 
conflict with their parents over proposed activities which would 
receive no frown of disapproval from the city parent. 

The authors are aware that the foregoing paragraphs of 
interpretation involve several assumptions with which the 
reader may disagree, The data are clear-cut in their exposi¬ 
tion of the facts; the interpietatlon of these facts must of 
necessity be somewhat subjective. 

On a third sub-scale, number IV, Emotionality, the data 
indicate dlfierenccs approaching statistical significance between 
students of Class A and Class B, and between Class B and 
Class C, These dllfercnces are probably related to the cor¬ 
relations presented in Table 4. Since the only correlations in 
this table above ,50 are those between Scales II and IV and 
Scales III and TV, at least some of the factors which influence 
Scales II and III must Influence Scale IV in a similar direction. 
It appears probable that the differences on Scale IV are depen¬ 
dent on these relationships to some extent. 

Inter-Relationships Between Scales 

The data of Table 4 are of special interest on two points. 
The similarities of the correlations based on the scores of 
Minnesota students and of South Carolina students are strik¬ 
ing. The critical ratios of the differences of these various 
correlations were calculated, and of the 20 comparisons thus 
made, only one critical ratio, that between Scales II and IV 
for men, was above 2.0, and this one did not reach significance. 
As the table indicates, the range of average correlations for 
the four groups studied is only .02. 

The second point of interest concerns the relationships 
of Scale IV to other scales in the test. This sub-scale meas- 


264 



LOCAL FACTORS AFFECTING SCORES 


TABLE 4 

INTER-CORRELATIONS OF THE FIVE SUB-SCALES OF THE 
MINNESOTA PERSONALITV SCALE 


Correlation 

Minnesota 

Men (N=S77) 

S. Carolina 

Men (N=241) 

Minnesota 
Women (N=SS7) 

S Carolina 
Women (N=144) 

Scale I with II. . . 

. .41 

.37 

.36 

.31 

I with III. . . 

.26 

.33 

.34 

.26 

I with IV 

.38 

.36 

.38 

.34 

I with V . . 

. 21 

.22 

18 

.18 

Scale II with III 

.25 

.36 

.26 

.32 

II witli IV 

. .53 

.39 

.48 

.51 

II with V . .. 

.17 

.19 

13 

11 

Scale III with IV 

. .52 

.56 

.54 

.58 

III with V . 

.24 

25 

.16 

.17 

Scale IV with V. . 

. .21 

.13 

.15 

.28 

Aveiage. 

.32 

.32 

30 

31 


ures emotional stability, and includes many questions usually 
found in an inventory of neuiotic traits The fact that it re¬ 
flects and is reflected in other areas of personal adjustment is, 
therefore, quite in harmony with repeated observations of 
clinical psychologists. As has been pointed out, the only cor¬ 
relations above .50 in the table involve this sub-scale. The 
average inter-correlation of this scale with all other scales, for 
all groups involved, is .40 The average inter-correlations 
of the other scales, exclusive of their correlation with Scale 
IV, are .29, .27, .27, and .18, respectively, for Scales I, II, 
III, and V. 

Summary and Conclusions 

The Minnesota Personality Scales for men and women 
have been administered to 241 freshman men and 144 fresh¬ 
man women at the University of South Carolina. The scores 
of these students have been compared with the norms data 
published for the scale, based on the scores of 1,083 fresh- 


265 




EDUCATIONAL AND ESYCllOLOGICAL MEASUREMENT 


man men and 888 freshman women at the University of Min¬ 
nesota The total scale consists of five sub-scales which meas¬ 
ure Morale, Social Adjustment, Family Relations, Emotional¬ 
ity, and Economic Conservatism 

The data herein presented support the following con¬ 
clusions : 

(1) South Carolina students, both men and women, ob¬ 
tained scores indicating a significantly better adjustment in 
Morale and in Family Relations than those of the Minne¬ 
sota students. There is some evidence that the South Caro¬ 
lina students are superior in Social Adjustment, though this 
conclusion is not clearly substantiated. 

(2) South Carolina students from towns of 2500 to 25,- 
000 population (Class B) appear somewhat better adjusted 
on all scales than students of towns of 2500 or less (Class 
C), and the latter slightly better adjusted than students from 
cities of 25,000 and over (Class A). These differences reach 
significance between Class A and Class B in Social Adjustment 
and in Family Relations, and approach significance between 
Class B and Class C on the same scales They approach sig¬ 
nificance between Class A and Class B and between Class B 
and Class C in Emotionality. 

(3) Inter-correlations between the several sub-scales 
based on data from Minnesota students and from South Caro¬ 
lina students are strikingly similar. The scale measuring Emo¬ 
tionality shows the only inter-correlations above 50, and 
shows a higher average Inter-correlation than any other sub¬ 
scale, Correlations between Emotionality and Social Adjust¬ 
ment and between Emotionality and Family Relations are 
above .50. 

REFERENCES 

1. Dailey, John G., and McNamaia, Walter J. Minnesota Personality 
Scale, Manual of Directions. New York: The Psychological Coi- 
poration, 1941, 

2. Darley, John G., and McNamara, Walter J Minnesota Pet son- 
ality Scale (For Men) New Yoik: The Psychological Corporation, 
1941, 

3. Darley, John G., and McNamaia, Walter J. Minnesota Person¬ 
ality Scale (For Women). New Yoik: The Psychological Cor- 
poiation, 1941. 

4. Willis, Maiy, et al, Goopeiatwe Geneial Achievement Tests, Num¬ 
ber I A Test of Geneial Pioficiency in the Field of the Social 
Studies, Form QR. New York: The Cooperative Test Service, 1940. 

266 



T?IE PLACE OF APTITUDE TESTING IN 
THE PUBLIC SCHOOLS 


DONALD E SUPER 
Claik Univeisity 

I N THE PRACTICE of aptitude testing, three basic 
assumptions are important. These assumptions have been 
so well established by research in the psychological labora¬ 
tories, in the schools, and in ,industry that they are now gen¬ 
erally taken for granted and need little justification, 

One assumption is that individuals differ in the extent to 
which they,possess any given aptitude, some being well en¬ 
dowed with the aptitude, let us say, to sing, others having 
little aptitude for vocal music, and most of us being potentially 
only mediocre singers. 

The second assumption is that there are a number of 
special aptitudes, such as aptitude for musical expression, apti¬ 
tude for mechanical work, aptitude , for visualizing the rela¬ 
tions of objects in space, scholastic aptitude, manual dexterity, 
and aesthetic judgment. 

The third assumption is that there are important differ¬ 
ences in the amounts of these various aptitudes possessed by 
a given individual 

Dr. Walter Dill ,Scott, pioneer industrial psychologist and 
until recently president of Northwestern University, has an 
interesting story illustrating this point. According to personnel 
data compiled by him, the most successful salesman in a whole¬ 
sale food company was also its least intelligent salesman. 
Unable to reconcile these two items of information, Dr. Scott 
investigated further, found that this was indeed the case, and 
sought an explanation. He found that the salesman,would go 
into a delicatessen, let us say, and chat with the owner and 
his wife, The conversation generally dealt with family and 
neighborhood affairs, about which the salesman kept posted, 
Finally he would get around to pickles and other items of 

267 



i;mrcM'iONAL and psychological mrasupement 


business. Then another man would enter the pieture, a second 
salesman employed to work with him, who discussed prices 
took orders, filled out blanks, and performed other clerical 
tasks which the star salesman could not handle. It actually 
paid the company to employ two men to do one man’s workl 
Such extreme variations of abilities in one individual are the 
exception, as Dr. Tcrman demonstrated in his "Genetic Studies 
of Genius," but the extreme case illustiates a less extreme 
tendency toward stabilization of aptitudes and abilities with¬ 
in individuals. 

These three assumptions have provided us with a basic 
philosophy of education and of guidance, together with a 
woiking program for the schools. Recognizing the potential 
worth of each individual, it becomes incumbent upon us, as 
members of a democracy, to provide for individual differences 
in the childien with whom we work It is also important, if 
we are to make our democratic system effective, to study the 
individual differences in our pupils and to help them under¬ 
stand their own abilities and interests, in order that they may 
choose wisely from among the various educational offerings 
provided. It is not enough to develop differentiated curricula, 
as will shortly be demonstrated, unless we also provide the 
means of making wise choices of curricula It is at this point, 
of course, that aptitude tests enter the picture. 

Before proceeding to discuss these last in some detail, let 
us dwell briefly on each of these two aspects of the working 
program of a democratic educational system, curricular differ¬ 
entiation and individual analysis, imposed upon us by our 
recognition of individual differences, special abilities, and trait 
differences. 

The history of American secondary education is in effect 
the history of a long drawn-out and not altogether conscious 
attempt to provide for individual differences. The colonial 
Latin Grammar School existed to provide pre-professional edu¬ 
cation, to prepare for college boys who were to enter the 
learned professions. It was largely supplanted by the Acad- 


268 



APTITUDE TESTING IN PUBLIC SCHOOLS 


emy, the purpose of which was to add two new types of edu¬ 
cation for two new types of pupils. It offered scientific and 
commercial training for those who weie planning to enter 
technical occupations and the field of business, in addition to 
academic courses. The public high school entered the picture 
in the last century in order more effectively to provide these 
same types of education. Its purpose was to offer pre-profes¬ 
sional and pre-commercial training, as an analysis of its sub¬ 
jects and of the then current vocational conditions would show. 
In more recent years the Industiial Arts course and the Trade 
School have been developed to meet the needs of those who 
are likely to enter the skilled trades, Some of the recent evalua¬ 
tive studies of public education, such as the Regents’ Inquiry 
in New York State, now advocate the development of a fourth 
type of secondary education, a high school with a curriculum 
designed to prepare youth for work in the semi-skilled trades 
and for the patterns of living typical of those employed at 
that level., A few schools already provide such courses. 

If we analyze the trends so briefly described above, wc see 
at once the increasing differentiation of our educational offer¬ 
ings as a result of the recognition of individual differences 
and the demand for appropriate curricula. 

One might expect, once a reasonable variety of educational 
offerings is provided, that the distributive mechanism of a 
democratic educational system would function smoothly. Pupils 
and their parents could look over the offerings and choose 
appropi’iate courses, especially if given the benefit of the 
advice of teachers and principals who are familiar with both 
the children and the courses. The practices of many schools 
have been based on this assumption Recent years, however, 
have shown that this is unwarranted, for numerous thorough 
studies have indicated large numbers of young pei'sons obtain 
an education of a type not suited to their vocational prospects. 

This statement should be illustrated with concrete facts, 
for such claims are all too frequently made without adequate 
foundation. Two-thirds of our youth of high school age 


269 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


attend high school, that is, are in schools which prepare for 
professional, commercial, or skilled employment. But only 
one-fourth of our employed adult population is actually en¬ 
gaged in occupations of these types. This means that five- 
twelfths, or approximately one-half, of the young people whom 
we are attempting to educate are actually being prepared for 
vocations and for ways of life which they will not enter. To 
put this in another way, our young people now tend to get 
an education planned in terms of the upper half of the occu¬ 
pational and social scale, whereas most of them enter and 
remain in occupations in the lower half of the scale. Surely 
no further proof is needed that young people need vocational 
guidance, that is, help in understanding and acting on their 
own abilities, interests, and opportunities. 

Given several different types of secondary school curricula, 
and granted the need to help students decide which type of 
curriculum to pursue, we must then devise methods of in¬ 
dividual analysis which will assist them in making curricular 
and consequent vocational choices. 

Various methods suggest themselves. We may examine a 
student’s marks in the different subjects which he has studied 
and find out in which types he has done the best work. But 
this approach has at least two important limitations: the 
courses which he has taken have been limited in variety and 
in number, and teachers’ marks are frequently unreliable and 
Invalid Indices of the quality of the work done Useful as 
such data are in understanding a pupil, they cannot in and of 
themselves suffice for the task at hand. 

We may keep cumulative records in which are noted not 
only the pupil’s marks, but his extra-curricular activities, his 
relations with his fellows, his special interests and out-of¬ 
school activities. These can, as we know from their wide use, 
be very helpful in understanding a child. But, again, the ex¬ 
periences are likely to be limited in variety (a defect which 
can be at least partly overcome) and the evaluations made 
of these experiences are necessarily subjective. They fre- 


270 



aptitude testing in public schools 


qiiently do not permit comparisons with other persons, and 
their real significance for curriculum and for vocations is too 
often not clear. 

Perhaps a digression is desirable to illustrate this last 
point, namely, the doubtful nature of some of the relationships 
which we think we see between hobbies and school subjects or 
vocations. It is widely assumed among philatelists that as a 
result of collecting stamps they learn a good deal more con¬ 
cerning history, geography, and related subjects than they 
otherwise would To check up on this assumption, a series of 
studies of adolescent and adult stamp collectors were made. 
They were given tests of intelligence, of achievement in the 
social studies, and of technical philatelic matters. The same 
tests were given to a control group of non-philatelists. We 
found that the adult stamp collectors had learned a great deal 
more about stamp collecting, a little more about strictly 
factual aspects of geography (such as names of capitals), and 
nothing more about significant social problems The adolescent 
stamp collectors learned nothing but the technology of phil¬ 
ately from their hobby. These,and other studies suggest that 
Information concerning a pupil’s activities must be used with 
caution as an index of aptitude, of achievement, or of interest 
in supposedly related fields. 

It should be clear that aptitude tests are needed as a sup¬ 
plement to these other not too effective methods of analyzing 
human abilities. Interests, and achievements. They are needed 
because, when well constructed and wisely used, they are ob¬ 
jective, because they make possible comparisons between peo¬ 
ple, and because their curricular and vocational significance 
can be established with relative ease. These three concepts 
of quantification, reliability, and validity are now so generally 
familiar that they need not be elaborated. 

We may ask next: What aptitude tests should one use 
in a school, and at what age should they be used? Before an¬ 
swering that question we must find the answer ,to another: 
What do we want to measure, and when must we measure it? 


271 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


To reply to these questions in general terms at first, we 
want to measure those characteristics which are impoitant m 
success at a given stage of a child’s educational or vocational 
career some time before he cnteis that stage, in ordei that he 
may plan for it with wisdom. This means that different types 
of traits and abilities may well he measured at different ages 
and stages, as life’s decisions make those data desirable 

What are these stages when decisions are being made? 
One of the first, obviously, is when the pupil leaves the ele¬ 
mentary or junior high school to enter high school. Another 
is when he leaves high school to enter a vocation or a college. 
Still another is when he leaves college to enter an occupation 
If our schools are In fact pre-vocational, the data needed at 
one stage are substantially the same as those needed at 
another. 

When a student leaves the lower school to enter high 
school, he has to make decisions concerning the type of high 
school and the type of cuinculum, concerning the specific 
course within the curriculum, and, since the curricula are in a 
sense pre-vocational, concerning the general family of occupa¬ 
tions which he wishes to enter. These decisions must be based 
on the abilities of the pupil as they relate to the requirements 
of the courses. Tests should therefore be selected so that they 
will tap the various aptitudes, interests, and achievements 
which make for success and satisfaction in those courses. 

To profit from the college preparatory course a pupil 
should have high average or superior mental ability, for much 
of the content of the course is abstract; an extensive vocabu¬ 
lary, since the subject matter is contained in books and since 
its exercises are generally verbal; superior reading speed and 
comprehension, for the same reasons; ability to work with 
numbers, since numerical symbolism and manipulation are im¬ 
portant in many subjects, especially for the future technologist; 
interest in the why, how, and whence of things and of ideas, 
since of such is the content of most academic subjects and the 
basis for most of the professions. To the prospective college 


272 



APTITUDE TESTING IN PUBLIC SCHOOLS 


preparatory or academic student one would, therefore, want 
to give tests of scholastic aptitude, vocabulary, reading, math¬ 
ematics and other academic subjects, and interests 

Are such tests available for people as young as 14 or 15? 
Intelligence or scholastic aptitude tests have, of course, long 
been In use at all ages. Achievement tests in the tool subjects 
are equally well standardized and validated. Both can be given 
by teachers with a minimum of training in test techniques and 
can be scored for relatively little money. Interest tests are 
not so well developed at this level, but there are at least two, 
and probably three or four, which can be given to early adoles¬ 
cents with some confidence If the results aie to be used by 
competent people. Scoring may run into more money, but the 
cost is not necessarily prohibitive. 

Singularly little attention has been given in most localities 
to the qualities needed for success in the commercial course, 
although this has to some extent been remedied in more recent 
years. Again, intelligence or scholastic aptitude plays a part, 
although it need not be present in the same degree as in the 
college preparatory course. A good deal of subsequent dis¬ 
appointment would probably be avoided if most pupils with 
I.Q’s of less than 105 oi 110 could be motivated to choose 
courses other than the academic, and if most of those with 
I.Q's of less than 95 or 100 could similarly be guided (but not 
coerced) Into courses other than the commercial A mod¬ 
erately good vocabulary, reading ability, and mathematical 
achievement are required here too, although the minimum re¬ 
quirement is somewhat lower than that of the academic course. 
The interests of commercial employees are different from those 
of professional people not in degree, as has been true of their 
abilities, but in kind. To them questions such as why, how, 
and whence are less important To be specific, they are more 
interested In people as friends, figureheads or freaks than as 
organisms motivated by needs and drives and acted upon by 
forces In a human and material environment; a mountain is 
something to admire, to picnic on or to take a picture of rather 


273 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


than to analyze as a manifestation of ancient geological go¬ 
ings-on. Two types of special aptitude are needed for success 
in clerical work, the ahillties to recognize verbal and numerical 
symbols with speed and accuracy. In addition, sales people 
need certain personality traits which enable them to make ef¬ 
fective contact with customers. 

The same tests that are used with the prospective academic 
pupils can be used with those who are considering commercial 
courses. The two special clerical aptitudes can be measured 
by well-proved clerical aptitude tests, even at the fomteen- 
year-old level. Personality cannot so well be measured; tests 
and inventories are available, some of which have their uses, 
but it will be sonic time yet before they are worth the cost 
for purposes such as those now being considered. 

For the trade courses, as for the pre-professional and 
commercial, a minimum of abstract mental ability is required, 
but if we may judge by the evidence available from trade 
schools and from industrial research, the minimum for most 
trades is lower than for the other two groups Apparently 
an I Q. exceeding 85 or 90 is generally sufficient to enable one 
to master the arithmetic and other school subjects needed in 
most skilled trades, although some rate considerably higher 
than most routine office jobs Given this, certain special apti¬ 
tudes assume primary importance, the specific aptitudes and 
the amounts of those needed varying somewhat from trade 
to trade. Moie than average manual dexterity is not needed 
in most skilled trades, but for those in which it is required, 
it can be tested What appears to be most important both in 
learning a trade and in practicing it is mechanical aptitude or 
insight, a special ability which is independent of scholastic 
aptitude. It can be measured fairly well by means of seveial 
paper and pencil tests and by performance tests. Of equal 
importance (and probably underlying the former) is ability to 
visualize spatial relations; that is, to judge the relationships 
of shapes and sizes in work such as machine shop and drafting. 
This ability can be measured by good group and individual 


274 



aptitude testing in public schools 


tests Finally there is interest, which must be considered in 
this as in other fields. The interests of persons in trade schools 
and in the skilled trades tend to resemble those of people in 
technical schools and m scientific occupations, but on a lower 
mental level. They like the concrete and the practical; they 
piefer to work with objects which they can manipulate and 
transform rather than with abstract problems, with records, 
or with people. These interests, and the aptitudes mentioned 
above, can be measured fairly satisfactorily in early ado¬ 
lescence 

We will in time pay more attention to aptitudes needed 
for education for the semi-skilled and unskilled occupations. 
Again, minimum and maximum levels of mental ability will 
need to be taken into account, and these will be lower than 
for the other types of curricula Achievement in the tool sub¬ 
jects of the school will have less vocational importance, 
Mechanical aptitude will not play a prominent part. Manual 
dexterity and ability to visualize spatial relations will vary 
consideiably from one kind of job to another. Physical 
strength and stamina will play more part in unskilled work, less 
in semi-skilled. The interests of people who enter these occu¬ 
pations, if we may judge by the not too adequate data now 
available, are not clearly differentiated. Apparently they have 
little m the way of special educational or occupational interests. 
We must study them more intensively in order to find out 
what does really challenge and appeal to them if we are to 
devise satisfactory curricula for these groups As we learn 
more about them we will develop more adequate tests for 
working with them, especially in the field of interests. The 
other characteristics can be measured reasonably well at 
present. 

A very Important objection is not infrequently raised at 
this point. Assuming that we use these tests and obtain such 
information about our pupils, how are we going to get them 
to act upon it? Are we going to tell them what they can and 
what they cannot do ? Are we going still further and tell them 
what they may and may not do? Is such action in line with 

275 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


the democratic philosophy which is one of our basic assump¬ 
tions? 

The answer lies in pointing out that tests can be used for 
either of two purposes • guidance or selection. The necessity 
for questions such as the above arises from a confusion of the 
two, and a consequent misunderstanding of the former. Guid¬ 
ance, or counseling, consists of helping a person to gain insight, 
to develop self-understanding. Selection involves choosing 
those who have the desiied characteristics and offering them 
the opportunity in question. In a democratic society we must 
do both, but the processes must not be confused. 

The vocational and educational counselor has, as his func¬ 
tion, helping youth to obtain experiences which will give them 
insight into their abilities and interests. Taking aptitude tests 
is one such experience. The counselor is concerned also with 
helping him to evaluate these experiences. Discussing the test 
results and their significance, as shown in the experiences of 
others who have made similar scores, is the way in which the 
results of the test experience are evaluated. The counselor 
does not say that you can, or you cannot, do thus and so; he 
shares with the youth information to the effect that such and 
such a percentage of other youth who made scores comparable 
to his did or did not do thus and so. The significance of 
these experience tables must then be discussed, and the youth 
must make his decision on the basis of the insight thus gained. 

The school using aptitude tests for selection is faced with 
another problem. It, too, has experience tables, to use the 
insurance term, or norms, to use the educational. Having 
tested an applicant, it says to him- “You have characteristics 
which suggest that you will be successful in this line of train¬ 
ing and work: we will admit you as a student”; or it says: 
“Experience shows that most students with your characteristics 
do not complete our course, so wc do not feel justified in 
investing time and money in giving you this training.” Thus, 
society protects Itself and its lesources, and experience teaches 
the individual to make a wiser choice. 


276 



aptitude testing in public schools 


Such uses of tests imply the existence of two basic condi¬ 
tions : tests which are thoroughly standardized, and test users 
who know their tools. To administer and to score most tests 
is relatively easy. To interpret them wisely requires great 
skill, considerably specialized knowledge, and profound wis¬ 
dom ripened by experience. 

A few brief words of summary may be helpful in closing. 
The place of aptitude testing in the public schools is the place 
at which choices need to be made. It is the place at which 
objective data are needed to provide a basis for those choices. 
And it is the place at which a trained, skillful, and wise coun¬ 
selor is available to assist in evaluating the data on which those 
choices should be based. 


277 




EFFECT OF ENGINEER SCHOOL TRAINING ON 
THE SURFACE DEVELOPMENT TEST 


BUTH D. CHURCHILL, JEANNE M. CURTIS, CLYDE H COOMBS, 
AND THOMAS W. HARRELL, 1st Lt., A.G D. 

Personnel Piooeduies Section, The Adjutant Geneial's Office 

F AUBION, CLEVELAND, AND HARRELL' report 
that six weeks of "intensive training in mechanical courses 
does not significantly Increase mechanical aptitude test scores, 
even where the test is very similar to the activities carried out 
in the training. This is strikingly true of the Suiface Develop- 
ment Test, in which the items resemble mechanical drafting 
and blueprint reading work.’’ 

An analysis of the effects of nine weeks’ training at an 
Enlisted Men’s Engineer School gives contradictory results 
for the Surface Development Test. In this case, there are 
significant Inci eases in scores on the second administration of 
the Surface Development Test after nine weeks’ training. 

The content of the Surface Development Test used in this 
study IS similar to that of the one used in the previous study; 
It involves matching drawings in two dimensions and in three- 
dimensional perspective. 

Since the same form of the test was used both at the be¬ 
ginning and at the end of the training, the increases in scores 
may be attributed to two factors: practice effect and the actual 
training received in the course. The tests were not given to a 
control group which received no training, but it is possible to 
compare the increases in scores of two different classes at the 
school. Since the training received in the Drafting Class is 
closely associated with the problems of the Surface Develop- 

IR W Faubion, E. A Cleveland, and T. W Harrell, “The Influence of 
Training on Mechanical Aptitude Test Scores ’* Educational and Psychological 
Measurement, II (1942), 91-94 


279 



KDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

incut Test, this class can be used as the experimental group 
The instruction in the Watei Purification Class covers the 
principles and applications of electricity and automotive 
mechanics as well as water puiification. Presumably this mate¬ 
rial is little 1 elated to the abilities involved in the Surface 
Dcvelopinenl Test, so that this class can be used as a control 
group. Table 1 compares these two classes with respect to 
their increases in scores after nine weeks’ training. 

TABLE 1 

MEAN SURFACE DEVELOPMENT TEST SCORIS OF THE TWO Cl \SSES BEFORE 
AND AI'IER NINE WEEKS’ TRAINING 

_ 


No. 

Mcan^ 

Mean, 

D 

o-D 


Dialling. 66 

71 89 

93.12 

21 23 

1 66 

12 79 

Water Puiification . fit) 

52 +8 

6+18 

11.70 

1 45 

8 07 

Diufting vs, Watei Piu'iheation.. 

19+1 

28.9+ 

9 S3 

2,20 

432 


The gam made by the Drafting Class on the Surface De¬ 
velopment Test is signilicantly greater than that made by the 
Water Purification Class. Both classes also took tests of 
mechanical information and comprehension befoie and after 
the nine weeks’ training. The content of these tests is not re¬ 
lated to drafting, although it may be to water purification. 
The two classes made small but significant gams on both tests; 
the Water Purification Class gained more than the Drafting 
Class, significantly so on the mechanical information test It 
may be inferred, therefore, that, on the Surface Development 
Test, the greater gain of the Drafting Class as compared with 
the Water Purification Class is a result of the content of the 
drafting couise. 

The most probable explanation of the contradictory results 
of the two studies lies m the difference in the amount and 
intensity of the training which each group received. For the 
airplane mechanics, mechanical drafting and blueprint reading 
was only one out of five coinses; over a period of six weeks, 
they received 40 hours’ instruction in this subject. The Draft¬ 
ing Class at the Engineer School, however, studied nothing 
but drafting and had almost 400 hours’ training in the various 
phases of that subject. 


280 




AN AID TO STUDENT COUNSELORS 


RALPH F. BERDIE 
University of Minnesota 

M uch time is spent in the counseling interview es¬ 
tablishing rapport between the interviewer and the 
student and diagnosing problems of varying complexity. Only 
after the counselor has obtained clues to and adequately diag¬ 
nosed the problems of the student may actual therapeutic 
work proceed In searching for these clues the counselor often 
spends a great deal of time asking questions and persuading 
students to talk about their activities and past experiences 
Poor achievement or general dissatisfaction on the part of the 
student may suggest to the counselor the existence of a prob¬ 
lem, but he must then determine if the student is worrying 
specifically about his health, his inability to get along with his 
father or his meager social life. When this has been done, 
the student can then be helped to do something about his 
pioblem. 

The student who comes to the counselor usually has a 
complaint. He comes because he is having difficulty in choos¬ 
ing a vocation or is failing his chemistry or is running out of 
money These expressed problems demand the attention of 
the counselor and may provide a starting point for his inter¬ 
view. Most often these complaints are only symptomatic of 
other problems or else are generalized expressions of several 
other problems. A student claiming vocational indecision may 
actually be suffering from lack of information regarding his 
own abilities and characteristics, a paucity of vocational in¬ 
formation and paternal pressure urging him toward a distaste- 

281 



F.nrCATIONAI, AND IVSVC’IIOI.OCUCAL MEASUREMENT 


ful occupation. A student having tiouble with his school work 
may actually he suflcring from poor study habitf inadequate 
reading skills, and too much outside woik. After recognizing 
a general problem the counselor must make a more specific 
diagnosis and then initiate treatment. 

Many students approach the counselor with a particular 
orientation dependent upon prevailing stereotypes associated 
with the counseling program. They come for vocational or 
educational advice without even considering that they may be 
able to receive help with some of their other problems. A 
student may come to the counselor for assistance with his 
study methods and never think that he might possibly learn 
how to handle an unpleasant family situation nor realize he 
should try to do anything about it. He has thought of the 
counselor as serving only one of the several purposes actually 
served by that counselor. 

To assist the counselor in his diagnosis and to suggest 
to the student the various functions of the counselor, a prob¬ 
lem check list has been developed at the Testing Bureau at 
the University of Minnesota and used successfully for over- 
one year. The check list consists of thirty-three statements 
of various problems encountered frequently in student coun¬ 
seling These problems were obtained from books on coun¬ 
seling (j), (//), and from a survey of case histories of stu¬ 
dents The purpose of the list is to facilitate the interview 
pr-ocesses and to assist the counselor in determining what prob¬ 
lems the student faces. It provides an opportunity for the 
counselor to approach problems that are often difficult to 
bring up in the Interview and gives the student an opportunity 
to consider what he wants to talk about before the actual 
interview. 

A more extensive problem check list has been published 
by Moody (/) His longer list may prove more useful in 
counseling situations which do not provide a great deal of 
other Information about students. Where much information 
is obtained through interviews, tests, and questionnaires, a 
shorter check list of problems is more economical and perhaps 


282 



AID TO STUDENT COUNSELORS 


more useful in the interview Wrenn ( 5 ) has published a 
clieck list to help the counselor in diagnosing and treating 
problems centering around study habits. Symonds has also 
done extensive work involving a check list of problems of 
adolescents and others ( 2 ). 

The problem check list was included in the individual 
record form used at the Testing Bureau and given to the stu¬ 
dents before the counseling interview Directions to the stu¬ 
dent were as follows: 

Everyone faces problems throughout his life Some of 
these problems cannot be solved without help, Many times 
they are very easily solved At other times they are solved 
only after much effort Below are a list of problems with 
which young people are often concerned After those prob¬ 
lems you have not been able to solve adequately, place a check 
(V)- After those problems which you would like to discuss 
with a counselor, place a double check (VV) These will help 
us to be of greater assistance to you. 

The responses of 208 men students and 119 women stu¬ 
dents were tabulated to determine the number of students 
checking each problem. The number and percentage of men 
and women checking and double-checking each of the items 
are presented in Table 1 

Over one-half of the men and women coming to the Test¬ 
ing Bureau desired to discuss what they were best able to do 
Slightly less than one-half wanted to discuss what they would 
like to do. Students coming for counseling express great con¬ 
cern with their abilities and interests, as well they might The 
two other items students most v/ished to discuss were their 
study habits and the training requirements for their chosen 
occupations. Students and faculty members have tended to 
place great emphasis upon the educational and vocational 
services offered by the Testing Bureau, and problems in these 
areas are the ones which students are most ready to bring 
to the counselors 

Comparison of the numbers of students checking and dou- 


283 



TABLE 1 

NUMBER AND PER CENT OF 208 MEN AND OF 119 WOMEN WHO PLACED SINGLE AND 
DOUm L CHECKS OPPOSITE EACH OF THE PROBLEMS 

Men Women 

Single Double Single Double 
Check Check Check Check 
No % No. % No. % No' % 

1. I usually feel infeiiui lo my associates 30 14 10 5 23 20 5 5 

2. I have been unable to determine how 

much time I should study.30 IS 31 15 9 8 14 12 

3. I have too few social contacts . ... 31 15 5 2 12 10 6 5 

4. I have difficulty in making friends .. 15 7 1 0,5 6 5 4 3 

5. I do not know how to obtain the money 1 

need . . M 7 11 5 5 443 

6 . I have been unable to deteiinine what I 

am best able to do , . 50 24 106 51 17 14 57 43 

7. I do- not know how to take good Icctuie 

notes . . , 58 28 29 14 20 17 17 14 

8 . I do not' get along well with my patents 9 4 0 0 8 7 0 0 

9. I often have difficulty in keeping fitends 8 4 1 0 5 1 1 1 1 

10 . I am unable to deteimine what I would 

like to do . 30 14 72 35 20 17 42 35 

11 I have not obtained paiental appioval of 

my vocational plans. 9 4 1 05 2 2 2 2 

12. I do not have enough to talk about tn 

company. 36 17 7 3 21 18 3 3 

13. I receive inadequate financial help ftom 

my family 73213323 

14 I do not know how to outline text-book 

assignments , .... . 26 13 12 6 4 3 9 8 

15 I am unable to get along with my 

biothers and/oi sisteis ... 311053300 

16 I have been unable to make a satisfac¬ 
tory teligioua adjustment. 15 7 1 0.5 9 8 1 1 

17 I am not interested in my studies ..13 6 3 1 1 133 

18 I do not have enough mfoimation about 

job oppoULimtie.s and duties. 24 12 27 13 15 13 10 8 

19. I am frequently embaiiassed when with 

otheis ’ • . • 17 8 1 0.5 10 8 2 2 

20. I usually do not enjoy being with mem¬ 
bers of the opposite sex, , ....10 5 3 1 4311 

21 . I am unable to do my woik well because 

of too many social activities 9421 4311 

22. I usually do not know how to act in com¬ 
pany . ....10 5 00 0000 

23 I usually cannot read fast enough to 

covei all of my assignments . . 23 11 10 5 10 8 1 1 

24 I usually have difficulty undeistanding 

what' I lead . . 19 9 4 2 12 10 1 1 

25 I do not know what the most appiopiiate 

tiaming Is foi my chosen careci 17 8 39 19 7 6 24 20 

26. I do not know if an education is woith 

while . .. 5 2 6 3 2 2 0 0 

27 I feel guilty about something I have or 

have not done . .14 7 1 0.5 8 7 3 3 

28. I have so much outside woik to do that I 

am neglecting my school woik .3131 1111 

29. I have tiouble making myself study, . 51 25 19 9 12 10 9 8 

30. I lack self-confidence. . 35 17 9 4 29 24 10 8 

31 I am dissatisfied with my state of health 12 621 4300 

32 I do not know how to improve my per¬ 
sonal appearance ,, 5200 1300 

33 I do not know how to bieak ceitain 

habits I have .... ., 12 6 1 0.5 4 3 1 1 


284 





AID TO STUDENT COUNSELORS 


ble-checking each item reveals that many students are aware 
of problems which they are not eager to discuss with a coun¬ 
selor. Many students feel inferior to their associates but do 
not express a desire to discuss this with a counselor. Many 
state that they lack self-confidence. Many consider that they 
have too few social contacts but would not hke to talk to a 
counselor about this. In view of the many techniques avail¬ 
able for the counseloi in dealing with the social problems of 
students at the University, the students appear to be turning 
away from possible assistance Relatively raoie students sin¬ 
gle-check items related to reading problems than double-check 
these items Inspection of these figures shows that students 
are aware of their educational and vocational pi'oblems and 
are frequently willing to talk about them. Although they 
often recognize social problems and personal problems they 
have little desire to discuss these with their counselors 

This reluctance to discuss certain types of problem may be 
due to the fact that the students think that nothing can be 
done about these problems and that consequently time would 
be wasted in discussing them with a counselor. They may 
consider their personal problems too private to discuss with 
a relative stranger, but this would hardly explain the few 
double checks opposite the reading problems. When students 
come to the counselor, they come with one primary purpose 
and all other matters may appear irrelevant at that time. 
Coming to a counselor may follow or accompany some crisis 
in the life of a student, a failiu’e or a forced change of voca¬ 
tional plans, and this crisis may envelop the entire horizon 
of the student. 

The students Included in this study had also filled out the 
Minnesota Personality Scale. On this test scores are avail¬ 
able for morale, social adjustment, family adjustment, emo¬ 
tional adjustment, and economic conservatism. The score on 
the morale section is related to the individual’s emotional 
acceptance of surrounding social and community situations. 
Very high scores may indicate naive optimism, low scores cyn¬ 
icism or lack of hope for the future. Scores on the social ad- 

285 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


justment section are related to the social maturity, gregan- 
ousness, and socialization of the individual. High scores on 
the family lelations section indicate friendly and healthful 
child-parent i elutions. Scores on the emotionality section are 
related to emotional stability. Low scores on this section 
may often result from hypochondriasis, anxiety states, or over¬ 
reactive tendencies. Scores on the economic conservatism sec¬ 
tion are related to the liberality of the individual’s economic 
attitudes. 

On the basis of selected personality test scores, compari¬ 
sons were made between the means of those students who 
had checked items and those who had not checked those same 
items. For example, of the 193 men for whom complete data 
were available, 11 checked the item, “I do not know if an 
education is worth while.” This item was left unchecked by 
182 students. The mean personality scoies for groups check¬ 
ing and not checking the items and their critical ratios (dif¬ 
ference divided by standard error of the dillerence) are pre¬ 
sented in Table 2. The items have been grouped into func¬ 
tional categories on an inspectional or logical basis. Only those 
items have been included which appeared relevant to the scores 
on the Personality Scale. 

On each item related to social behavior, the group check¬ 
ing the item had significantly lower scores on the social ad¬ 
justment test than did the group not checking the item. The 
men who indicated that they had too many social activities, 
however, had higher scores on the social adjustment section, 
as would be expected. Since many of the problems on the 
check list are very much like some of the items on the test, 
the observed relationship is not surprising. Students who 
claim that they do not get along well with their parents ob¬ 
tain significantly lower scores on the family lelatlons section 
of the personality scale than students who do not check this 
item. However, men whose parents do not approve of their 
vocational choice do not obtain significantly lower scores on 
this section. Students who claim that they have been unable 
to make a satisfactory religious adjustment make no lower 

286 



TABLE 2 


COMPARISON OF MEAN SCORES ON PERSONALITY SCALE OF STUDENTS CHECKING AND 
NOT CHECKING PRODLEMS 


MEN 


Section of 

Peisonality Pioblem 

Scale ___ Checkci 

Mot ale 

I do not know if an education n woith 
while . , • • .. . 1S6.S5 


Mean of 
Studentu 
Who Did 
NOT 


Checked Item Check Item 


Social 

I have too few social contacts . . . .196.83 223.66 4.56 

I have difficulty in making fuends . . . .184.88 221 71 3,72 

I do not have enough to talk about in com¬ 


pany .192 19 226 01 5.74 

I am fiequently embairassed when with 

others. 182.06 222.19 3 70 

I usually do not' enjoy being with members 
of the opposite sex ... . 187.54 220 90 2 94 

I am unable to do my woik well because 

of too many social activities . 239.80 217 SO 2,86 

I usually do not know how to- act in com¬ 


pany ... . . . ... 177 10 220 92 2.98 

I lack self-confidence ... , 197 95 224 07 4.10 

Family 

1 do not get along well with ray paicnts. . 99 66 120,06 2,59 

I have not obtained paiental approval of 

my vocational plans .. 109.78 119.56 1,44 

Emotional 


I have been unable to make a satisfactory 
religious adjustment ... ... .134.36 129.77 .94 

I am fiequently embaiiassed when with 

otheis.119 41 131,13 2 20 

I usually do not enjoy being with members 

of the opposite sex. . 128.23 130 23 .42 

I feel guilty about something I have or have 

not done..118,57 131 00 2 26 

I lack self-confidence . . , . ... 121.70 132,29 3,15 

I am dissatisfied with my slate of health. 116 85 131.06 2,39 


_ jy OM EN _ 

Social 

I have too few social contacts, . . .182.47 201 19 229 

I have difficulty in making fiiends. 162 56 201.90 5,20 

I have not enough to talk about in com¬ 
pany . 177,00 203 96 3.99 

I am fiequently embariassed when with 

others . . 179 09 200,78 2.16 

I lack self-confidence ... .. 179 97 207.58 4 76 

Family 

I do not get along well with my parents 102,38 139 02 6 76 

Emotional 

I have been unable to make a satisfactory 

religious adjustment . 161 29 163.70 .29 

I am fiequently embarrassed when with 

otheis. . ,,146 64 165,55 3.10 

I feel guilty about something I have or have 

not done . ...148.10 165,19 2.34 

I lack self-confidence, , 158 55 165 89_1.51 


287 






EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


scores on the emotional adjustment scale than do other stu¬ 
dents. Guilt feelings apparently are related to the score on 
this section, and both men and women who check this item 
obtain significantly lower scores than those who do not check 
it. Men who claim dissatisfaction with their state of health 
obtain lower emotional adjustment scores, as do men who 
claim to lack self-confidence. Women who indicate a lack 
of self-confidence, however, do not differ significantly on the 
basis of this scale from other women. 

Of the 27 relationships analyzed, 21 were found to be 
statistically significant. Students tending to obtain low scoies 
on the various sections of the personality scale will also tend 
to check related items on a problem list and thus supply the 
counselor with a clue regarding the source of these low scores. 
Perhaps the same information could be obtained by going 
through the items of the test, but as there are 218 items in 
the test, each with five possible answers, this would require 
much of the counselor’s time A factor analysis of the items 
of the test would perhaps identify a few key items which could 
be used for the same purpose as the problem check list, but 
until this is done, the counselor may more economically glance 
at the 33 items on the check list than read through the many 
items of a personality test 

Added to the statistical evidence concerning usefulness of 
the problem check list is much evidence obtained from clinical 
work involving the use of this instrument, The description 
of a few cases in which it has proved useful will exemplify 
this and also suggest various techniques that have proved suc¬ 
cessful in using the list in the interview. 

Joseph PI. came to the Testing Bureau for assistance in 
deciding upon a major in the College of Education He was 
completing his second year in the university and had been 
doing slightly better than average work. He had graduated 
from a small high school, and his social life had been very 
restricted in the little town from which he came. Among 
other items, he double-checked that he had too few social 
contacts. Plis percentile score on the social adjustment section 


288 



AID TO STUDENT COUNSELORS 


of the Personality Scale was 24. After discussing the boy's 
vocational plans, the counselor said, “I see that you check 
that you have too few social contacts. What do you think 
you could do about that?” Joseph staited to discuss the fa¬ 
cilities available on the campus and soon he and the counselor 
had a social program planned, and the counselor gave him a 
letter of Introduction to the secretary of the Y. M. C. A. 
The item checked by the boy in this case gave the counselor 
an opportunity to approach a problem the existence of which 
might have been easy to determine but for which treatment 
might not have been initiated so easily without the item. 

George S had checked several items on the problem check 
list, including the one, ‘‘I feel guilty about something I have 
or have not done " A single check had been placed opposite 
this item. During the interview the counselor decided that 
the boy presented a picture of a very unstable individual and 
that various personal problems might interfere with his prog¬ 
ress when he entered college the following fall. The coun¬ 
selor was unable, however, to get the boy to discuss these 
problems Finally, he said, “I see you checked here that you 
feel guilty about something you have done or have not done.” 
After a pause he continued, “Many people feel guilty about 
things they have done, and usually feeling guilty about it is 
the only thing that does any harm.” He paused again, and 
George began to speak of the problems that had been worry¬ 
ing him and of his reactions to these problems In this case, 
the problem check list provided an opportunity for the coun¬ 
selor to approach a problem which had previously resisted 
all attempts to approach it. 

We have found that when an item is double-checked, the 
most convenient and profitable thing the counselor can do is 
to refer directly to the item and give the student an oppor¬ 
tunity to elaborate upon his response. When only a single 
check Is placed opposite the item, however, this can seldom be 
done. The counselor will have to remember that the student 
did not Indicate that he wanted to talk about the subject 
checked and that he may actually resent any attempt on the 

289 



EDUCATIONAL AND PSVCHOLOGICAL MEASUREMENT 


part of the counselor to start such a discussion. When an 
item has been checked only once, the counselor can often 
discuss the problem mentioned and give the student an oppoi- 
tunity to ask questions without making the student aware that 
the item itself is being referred to. 

Sarah W. placed a single check opposite the item, “I have 
not obtained parental approval of my vocational plans ” Dur¬ 
ing the interview, after discussing various alternatives, the 
counselor asked, “What do you think your parents would like 
you to do^” Sarah then told what her parents' reactions were 
and also revealed a family problem which had not even been 
suspected up to that point in the Interview. If the counselor 
had asked, “Why don’t your parents approve of your voca¬ 
tional plans?’’. It is doubtful if Sarah would have given the 
information she actually gave. 

Sunt mary 

Statistical analysis of a problem check list and its clinical 
use have shown that it is a useful instrument in diagnosing 
students’ problems and In approaching these problems in the 
interview. The Items checked offer the counselor an oppor¬ 
tunity to select those areas which offer most promise for in¬ 
vestigation and to Introduce these topics in the counseling 
interview. The items also assist in orienting the student to¬ 
ward the counselor and in reaching a definition of his prob¬ 
lems before the Interview. 

REFERENCES 

(1) Moody, R. Problems Check List Columbus, Ohio' Ohio Uni¬ 
versity Press, 1941. 

(2) Symonds, P. K. “Life Problems and Interests of Adolescents," 
School Review, XLIV (1936), 506-518. 

(3) Williamson, E. G. How to Counsel Students New York: Mc¬ 
Graw-Hill, 1939. 

(4) Williamson, E. G. and Dailey, J. G. Student Peisonnel Work. 
New York; McGraw-Hill, 1937. 

(5) Wrenn, C. G. Study-Habits Inventoiy. California: Stanford Uni¬ 
versity Press, 1941. 


290 



A COMPARISON OF THK HUMAN BEHAVIOR 
INVENTORY WITH TWO OTHER PERSON¬ 
ALITY INVENTORIES 


ABKAIIAM SPERLING 
City College of New York 


P ENCIL-AND-PAPER TESTS for diagnosing personal¬ 
ity traits have too frequently proved unsatisfactory to the 
investigators employing them Statements expressing discon¬ 
tent with the diagnostic results of such tests are found in 
studies by Watson (1), Mosier (2), Landis (3), Moore and 
Steele (4), Feder and Mallet (5), Gorham and Brotcmarkle 
(6), Stagner (7)i and others too numerous to include here. 
Accompanying the criticisms, however, constructive sugges¬ 
tions are frequently made for the improvement of such instru¬ 
ments. Among the suggestions offered are the use of multiple 
answers, the use of weighted scoring, the development of reli¬ 
ability, better definition of terms, and abstention from scoring 
the same items for more than one trait. Because it is felt that 
the Human Behavior Inventory,'^ devised by Randolph B. 
Smith, represents an improvement in adjustment scales in ac¬ 
cordance with these suggestions, it is the desire of the investi¬ 
gator to bring this instrument to the attention of possible 
users. 

Employed in an experimental study (8) conducted by the 
investigator, the Human Behavior Inventory proved to he a 
most satisfactory Instrument for measuring traits of person¬ 
ality adjustment. It was devised for the purpose of testing per¬ 
sonality adjustment of a group of college students. Smith de¬ 
veloped the instrument by selecting from previous inventories 
the items found most diagnostic, modifying them in an effort 

iThis inventory is leproduccd in a itionograpli by Smith (9) 

291 



EDUCATIONAL AND PSTCIIOLOGICAL MEASUREMENT 


to make them capable of measuring status as well as change, 
and adding new items where necessary In his original study, 
Smith (9) employed the test to measure the personality status 
of college students at the beginning and end of a school year 
in which the individuals had been subjected to a course in 
mental hygiene. 

Description of the Test 

The inventory was developed to yield a total score which 
might serve as a measure of general personality adjustment, 
together with separate subscores on six individual sections 
(1—work efficiency, 2—superionty-inferlonty or degree of 
self-confidence, 3—social acceptability and adjustment, 4— 
emotional stability with reference to neurotic symptoms, ease 
of adjustment to new experiences, and general sex adjust¬ 
ments, 5—objectivity toward behavior of others, and 6—fam¬ 
ily attitudes and lelatlonships) which may be regarded as ma¬ 
jor characteristics of mental health and emotional maturity. 
The scale contains 102 questions, answeis to which are based 
on a five-degree multiple choice Each item is given a score 
ranging from 0 to 4, depending upon the degree of the answer. 
The estimated reliability of the total test score by odd-even 
correlation for 1125 cases was .89 ± 01 and by test-retest 
correlation for 465 cases after six months was .81 ± .01. 

Procedure 

This investigation was undertaken to compare the Human 
Behavior Inventory with previously constructed scales Sev¬ 
eral of the instructors in elcmentaiy psychology at the College 
of the City of New York administered to their classes the 
Human Behavior Inventory, the Bell Adjustment Inventory 
(10), and the short form of the Thurstone Personality Sched¬ 
ule as revised by R R. Willoughby, which is known as the 
Clark-Thurstone Inventory (H). Statistical data conceining 
these scales are described m the bibliogiaphical references 
noted. 

To each class in elementary psychology both the Human 
Behavior Inventory and the Clark-Thurstone Inventory were 


292 



HUMAN BEHAVIOR INVENTORY 


given during the same period. The Bell Inventory was given 
during the subsequent class meeting. One hundred seven com¬ 
plete sets of inventories were made available to the investi¬ 
gator. 

The Data 

The intercorrelations of the three scales are presented 
in Table 1, while other statistics concerning each inventory 
are given in Table 2. 

TABLE 1 

INTERCORRELATIONS AMONG HUMAN BEHAVIOR INVENTORY, BELL 
INVENTORY^ AND CLARK-THURSTONE INVENTORY 


No of Items in No. of Identical 



Coefficient of 

Respective 

Items Between 

Inyentones 

Correlation 

Scales 

Scales 

Human Behavior 


102 

12 

Inventory 

and Bell Inventory 

.736 ± .030 

140 


Clark-Thui stone 


25 


and 

.748 ± .029 


7 

Human Behavior Inventory 


102 


Bell Inventoiy 


140 


and 

.785 ± .026 


18 

Clai k-Thurstone 


25 



TABLE 2 

SCORES ON HUMAN BEHAVIOR INVENTORY, BELL INVENTORY, AND 
CLARK-THURSTONE INVENTORY 


Scales 

Range 

Mean 

S.D. 

Coefficient of 
Reliability 

Human Behavior 
Inventory 

36-202 

120 

(123) 

38 55 
(39.23) 

.918 

(.89) 

Bell Inventoiy 

1-77 

33 2 
(32) 

16.50 

(.93) 

Clark-Thurstone 

Inventory 

2-67 

26 

(29) 

15.07 

(13,70) 

(.91) 

Age 

17-24.6 

19.5 


N=107 


"■Piguies in paientheses are from the original studies by the authors. 

293 








EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


The fact that the Human Behavior Inventoiy correlates 
rather highly with the two older scales should not give the 
impression that it is a duplication of the exact content of the 
scales with which it was compared. ITowever, the high cor¬ 
relations probably indicate that it tends to measure the same 
factors, namely, those of personality adjustment. To check 
whether the high cori’elations were due to mere identity of the 
items in the scales, an analysis of the three questionnaires was 
made. 

The analysis (summarized in Table 1) showed that theie 
were eighteen, identical items in the Bell and Clark-Thurstone, 
seven in the Human Behavior Inventory and the Clark-Thur¬ 
stone, and twelve in the Human Behavior Inventory and the 
Bell. While it is thus seen that exact identity of items alone 
does not provide a reasonable explanation of the fairly high 
correlations between the scales, it is of course recognized that 
similarity of items not identical may also be a factor. 

The intercorrelations among the sections of the scales 
have been reproduced foi two reasons: first, to offer a record 
of the data for the benefit of others who may wish to make 
comparisons and, second, to demonstrate the similarity of re¬ 
sults obtained in this study and in the original studies by the 
respective authors To illustrate. Tables 3 and 4 show the 
closeness of the mean scores and intercorrelations obtained 
from the 107 subjects of this study to those of the 1145 of 
Smith’s study and the 258 of Bell’s study The rather high 
interielationships among the parts of the Human Behavwi 
Inventory may be an indication that the sub-scores do not 
represent separate psychological factors. However, it is pos¬ 
sible that they are indicative of truly close relationships 
among the several personality traits measured by the subsec¬ 
tions. Further exploration of these possibilities may well be 
the subject of a subsequent investigation. 

It may be pertinent to mention at this point that in the 
opinion of the investigator the importance of establishing rap¬ 
port between experimenter and subject for best results from 
pencil-and-paper tests of personality cannot be overempha- 

294 



HUMAN BEHAVIOR INVENTORY 


TABLE 3 

CORRELATIONS OF PARTS OF BELL INVENTORY WITH EACH OTHER 
AND WITH TOTAL’^ 


Parts 

Health 

Social 

Emotional 

Total 

Home 

,39 

(.43) 

.21 

(.04) 

.54 

(38) 

,757 

Health 


18 

(.24) 

.45 

(.53) 

.629 

Social 



44 

(.47) 

.655 

Emotional 




.832 

’'Figures in parentheses ate 

fiom the original 

study by the autho 

1. 


TABLE 4 

CORRELATIONS OF PARTS OF HUMAN BEHAVIOR INVENTORY WITH 
EACH OTHER AND WITH TOTAL^ 


4 

* 

t/i 

n 

0^ 

Sup Inf 

U 

o 

< 

d 

Q 

XI 

n 

w 

a 

e 

ci] 

o 

Fam. Rel 

"B 

o 

Total Less 
This Sec. 

s 

P 

c/i 

N o, of Items 
in Sec 

Work 

60 

,47 

.56 

.42 

.38 

.58 

.508 

12 87 

481 


Eff 

( 59) 

(.52) 

(.56) 

( 40) 

(44) 

(.68) 


(12.94) 

(4 90) 

9 

Sup 


68 

.73 

,59 

43 

78 

.772 

1412 

5 86 


Inf 


(.66) 

( 65) 

(.46) 

(46) 

(76) 


(15 57) 

(5 96) 

H 

Soc 



.76 

.56) 

S3 

80 

.797 

16 86 

6,66 


Acc. 



(72) 

(47) 

(.54) 

( 80) 


(17.75) 

(7.05) 

13 

Emot 




66 

,64 

.92 

.908 

28 59 

11 20 


Stab. 




( 63) 

(.60) 

( 88) 


(30 22) 

(11.55) 

29 

Obj 





45 

.77 

615 

18 58 

7 43 







(55) 

(74) 


(18 08) 

(6 93) 

14 

Fam 






78 

607 

29,43 

12 26 


Rel. 






(.85) 


(28 50) 

(1185) 

26 


*Figuies in parentheses are fiom the original study by R E Smith. 

‘•'’’The abbreviations of the subsection names refei to Woik, Efficiency, Supe- 
uonty-Inferioiity, Social Acceptability, Emotional Stability, Objectivity, and 
Family Relationships 


sized In this study, extreme care was taken in the matter of 
rapport In the original instructions each student was asked 
to volunteer his efforts in a research study that would have no 
bearing on his grades or standing at the college. Each indi- 

295 






EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


vidual was told that his replies would be treated entirely con¬ 
fidentially and anonymously unless he desired otherwise He 
was asked to be sincere and objective and to inform the in¬ 
vestigator if he felt his rapport was not valid. As an added 
incentive for an honest expression of their own characteristics 
as they know them, the students were told that they would be 
given the results of their tests in such a manner that they 
could compare their scores with the ai^erage of others taking 
part m the study if they so desired. A majority of the stu¬ 
dents were known to the investigator in a rather friendly stu¬ 
dent-teacher relationship. It is the opinion of this Investigator 
that disappointing results from the use of personality ques¬ 
tionnaires arc frequently due to a lack of rapport between 
experimenter and subjects. 

Summary and Conclusion 

The coefficient of correlation between the Hum an Be¬ 
havior Inventoiy and the Bell Inventory was .736, that be¬ 
tween the Human Behavior Inventory and the Clatk-Thur- 
stonc Inventory 748, and that between the Bell and the Clark- 
Thurstone .785. An analysis of the three measures showed 
more similar items between the Bell and the Clark-Thurstone 
scales than between the Human Behavior Inventoiy and eithei 
of these measures. 

In view of the similar positive coefficients of intercorrela¬ 
tion among the thiee scales, it may be concluded that the 
Human Behavior Inventory is probably as satisfactory for use 
as a diagnostic measure of personality adjustment as either 
of the other two measures with which it was compared Moie- 
ovei, the scale embodies several desirable features such as the 
use of multiple answers, weighted scoring, high reliability, 
clear definition of terms, and abstention from scoring the same 
items for mole than one trait. Since these aspects are among 
the suggestions made by authorities for the improvement of 
personality scales, they lend support to the acceptance of the 
Human Behavior Inventory as an instiument for measuring 
traits of personality adjustment. 


296 



HUMAN BEHAVIOR INVENTORY 


REFERENCES 

1. Watson, G. “Personality and Character Measurement,” Review 
of Educational Reseat chjY\l\ (1938), 269-291. 

2. Mosiei, C. I “On the Validity of Neurotic Questionnaires," 
Journal of Social Psychologyj IX (1938), 3-16. 

3. Landis, C “Empirical Evaluation of Three Personality Adjust¬ 
ment Inventories,” Journal of Educational Psychology, XXVI 
(1935), 321-330. 

4. Moore, H. and Steele, I. “Personality Tests,” Journal of Abnor¬ 
mal and Social Psychology, XXIX (1934), 45-52. 

5. Feder, D and Mallet, D “Validity of Certain Measures of 
Personality Adjustment,” Journal of American Association of Col¬ 
lege Registtars, XIII No. 1 (1937), 5-15. 

6. Goiham, D. R, and Broteinarkle, R. "Challenging Three Stand¬ 
ardized Emotionality Tests for Validity and Employability,” Jour¬ 
nal of Applied Psychology, XIII (1929), 554-588. 

7. Stagnei, R. “The Intercorrelation of Some Standardized Person¬ 
ality Tests,” Journal of Applied Psychology, XVI (1932), 453- 
464. 

8. Sperling, A. The Relaitonshtp betiveen Petsonaltiy Adjustment 
and Achievement in Physical Education Activities, Doctoral dis¬ 
sertation, 1941 On file in the library of New York University, 
New York. 

9. Smith, R. B. Growth in Peisonality Adjustment Thiough Mental 
Hygiene, Albany, New Yoik; University of the State of New 
York, State Education Department, 1936. 

10. Bell, H. M Manual for the Adjustment Inventory, Stanford 
University, California: Stanford University Press, 1934. 

11. Willoughby, R. R “Some Properties of the Thurstone Person¬ 
ality Schedule and a Suggested Revision," Journal of Social Psy¬ 
chology, III (1932), 401-424. 


297 




INTRA-INDIVIDUAL DIFFERENCES VERSUS 
INTER-INDIVIDUAL DIFFERENCES 
IN MOTOR SKILLS' 


WILLIAM A, OWENS, JE. 

Iowa State College 

S TUDIES OF VARIATION within and between indi¬ 
viduals have tended to be restricted to one sort of intra¬ 
individual variation, trait differences. They have also tended, 
in treating of the relative magnitudes of individual differences 
and trait differences, to display adherence to one of two modes 
of attack. Either they have dealt with the inter-correlations 
of certain traits or functions, or they have shown a comparison 
of a trait standard deviation with a standard deviation rep¬ 
resentative of individual differences. 

The present paper is an attempt to evaluate trait differ¬ 
ences and certain other intra-individual factors, and to relate 
them in magnitude to individual differences. 

The writer feels that the statistical technique which was 
employed in the present investigation was superior to either 
of the two which are conventionally used for the reasons which 
follow. First, neither the method of inter-correlation nor the 
method of comparing standard deviations will allow of the 
treatment of more than one intra-individual factor. Second, 
even in the comparison of individual and trait differences, the 


^This article is a condensation of the wiiter’s doctoial dissertation of the 
same title, a copy of which is on file at the library of the University of 
Minnesota. 

The writer wishes to acknowledge the invaluable criticisms and suggestions 
of his advisois, Professor D G, Paterson and Dr, P 0, John.son He also 
wishes to lecognize the assistance of Di Brent Baxter and of Mr Paul G 
Homeyer 

The actual experimental woik was done in the psychological laboratoiies at 
the University of Minne.sota with the cooperation of Dr. M, A. Tinker, and was 
made possible through a reseaich giant by the Graduate School of that in¬ 
stitution. 


299 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


magnitude of the correlation coefficient is conditioned by at 
least two variables besides the true magnitude of the relation¬ 
ship. These are unreliability of measurement and trait vari¬ 
ability." Third, product-moment correlation, or its equivalent, 
deals with absolute deviate ranks; no account of variation 
within these rank positions is taken so long as they do not 
actually shift.'* Fourth, standard deviations are increased by 
unreliabilities of measurement. If a systematic error were 
present, the error in measuring trait differences would not be 
equal to the error in measuring individual differences Even 
if this were not the case, a constant error factor would be a 
relatively larger component of trait differences—if it were the 
smaller—than of individual differences A statement of pro¬ 
portionality could, thus, only be accurate if individual and 
trait differences were of the same magnitude. 

The present experiment, which was designed for applica¬ 
tion of the analysis of variance, was planned with a view to 
taking account of these several objections to other techniques. 
In accordance with this purpose, certain facts are worth 
noting. First, account is taken of more than one intra-indi¬ 
vidual factor. Second, various sorts of unreliability are incor¬ 
porated in the estimate of error with at least two relevant 
consequences: the estimates of the magnitudes of the main 
factors are correspondingly more accurate, and direct tests of 


^Since terminology in this field has not been entiiely unifoim, the wiitei' 
includes his own definitions of the terms he employs 

Iiiter-indi'vidml differences = differences in relative proficiency from indi¬ 
vidual to individual, Jntr/i-individual differences = (1) trait differences, (2) 
repetitive variations, (3) trait variability, et al. 

Trail differences ~ differences in relative proficiency from function to func¬ 
tion within the individual, 

Repelitin/e variations = chatigsi in the individual's proficiency from day to 
day in the average of all functions measured. The systematic portion of this 
shift might be designated as learning, and the random portion attributed to 
shifts in the subject's efficiency. 

Trail 'Onriahi/ify — the term used by Paulsen to denote the fluctuation of a 
given function within an individual, temporally. See Paulsen, G, B “A Co¬ 
efficient of Trait Vailabllily,” Psychological Bulletin, XXVIII (1935), 218-19 

■'Harris has taken account of such a contention in developing his method of 
relative correlation. The proceduie is to corielate a first variable with the 
deviation of a second from its most probable value, Hariis, J, A “The Corre¬ 
lation Between a Variable and the Deviation of a Dependent Variable from 
its Most Probable Value,” BiomeUika, VI (1908), 438-443. 

300 



INTRA-INDIVIDUAL DIFFERENCES 


the significances of these factors are made possible. Third, the 
analysis is so based as to minimize the errors normally in¬ 
curred In the ranking of data, while the statistical technique 
employed takes account of the total variation—none of it goes 
exempt fiom analysis. With this brief preface, the present 
experiment may be outlined. 

The Problem —To obtain an estimate of the relative mag¬ 
nitudes of individual differences and of several intra-individual 
factors on some tests of motor skills. 

The Method —^Table 1 provides an abbreviated illustra¬ 
tion of the technique employed to determine the per cent of 
the total variation in score contributed by individual differ¬ 
ences. 

TABLE 1 


A SAMPLE ANALYSIS 


Individuals 


Administrations—Block Packing 


(15) 

II 

in 

IV 

VIII 

A 

425 

448 

425 


B 

502 

469 

530 


C 

514 

541 

648 


D 

384 

403 

392 

Norm 

E 

345 

436 

463 

M = 500 

F 

512 

467 

577 

o=100 

G 

511 

611 

623 


H 

241 

372 

428 


I 

522 

573 

548 


J 

453 

498 

567 


K 

317 

380 

302 


L 

461 

547 

517 


M 

394 

478 

496 


N 

534 

567 

631 


0 

381 

441 

491 


Conection Term 

= (21365)2/45 = 

10,143,627.22 


Total Variation 


= 356,569.78 



Individual Differences 

= 272,389 11 



Repetitive Variations 

= 44,667.51 



Error (Interaction) 

= 39,513.16 



Degiees of 

Sum of Mean 



Factoi Freedom' 

Squaies Square 


P % 

T.V. 44 

356,569.78 


100 

ID. U 

272J89.11 19A56.37 

13.79 

< .01 68 

R.V 2 

44,667.51 22,333.76 

15.83 

< .01 16 

Err. 28 

39,513.16 1.411.18 


16 


301 





KDUCATIONAI, AND PSYCHOLOGICAL MEASUREMENT 


The analysis of variance was employed with two criteria 
of classification — individuals and administrations^ The 
Isolates from the total variation (T.V ) were individual dif¬ 
ferences (T.D.),'’ repetitive variations (R.V.), and an esti¬ 
mate of erior (Err.) Especially to be noted is the fact that 
the per cent column furnishes an estimate of the magnitude of 
the contribution of individual differences to the total variation. 
An analysis of this sort was run for each test of the present 
experiment. 

The analysis of intra-individual factors took a similar 
form. Table 2 illustrates the method and the character of 
the second series of analyses. The isolates from the total 
variation (T.V.) were trait differences (TD.), repetitive 
variations (R.V.), and error (Err.). An analysis of this sort 
was made for each subject in the experimental group. The in¬ 
tention was ultimately to compare the per cent contribution of 
Individual differences to the total variation in the first seiles 
of analyses with the respective per cent contributions of trait 
differences and repetitive variations to the total variation in 
the second scries of analyses. 

Seven tests of motor skills were employed in this investi¬ 
gation. They were the block packing, steadiness, speed of 
movement, slow movement, stick balancing, tapping, and card 
sorting tests reported In the Minnesota Mechanical Ability 
Study Each one of IS subjects was given eight administra¬ 
tions of each of the seven tests — 56 testings per subject ® The 
tests were administered in systematically varied order on a 
schedule calling for about eight hours of each person’s time 
Subjects were junior high school boys matched for age, intel¬ 
ligence, and race, both with each other and with a norm group 
of 216 individuals.'' These boys were paid for their time, and 

*C. II Gnulden, Methods of Statistical Analysis (New Yoik John Wiley 
and Sons, 1939), pp. 114-1+1; especially p 127. 

IlFrom this type of analysis only the individu.il differences factor figures in 
later conaparisons. 

OResuIts from all first administiallons were, .as usual, discaicled as un¬ 
reliable 

ID G. Paterson, R. M. Elliott, et al. Miiinesnta Mechanical Ahility Tests 
(Minneapolis' University of Minnesota Press, 1930), p. 586. 

302 



INTRA-INDIVIDUAL DIFFERENCES 


TABLE 2 


A SAMPLE ANALYSIS 


Tests 

Administrations — Subject E 


(6) 

II 

III 

IV . 

. . VIII 

Block Packing 





Steadiness 

370 

370 

306 

Norm 

Slow Movement 

512 

618 

583 

M = S00 

Speed of Movement 652 

712 

649 

0 = 100 

Tapping 

557 

601 

619 


Stick Balancing 

520 

525 

538 


Coriection Teim 

= (9376)2/18 = 4,883,854.22 


Total Variation 

= 234,357.78 



Trait Differences 

= 213,415.11 



Repetitive Variation = 8,069.78 



Error (Inteiaction) = 12,872.89 



Degrees ol 

Sum of 

Mean 



Factor Fieedom 

Squaies 

Squaie 

ttpjj 

P % 

7^V. 17 

234,357.78 



100 

T.D. 5 

213,415.11 

42,683.02 

33.16 

<.01 89 

R.V. 2 

8,069.78 

4,034.89 

3.13 

> .05^ 3 

Err. 10 

12,872.89 

1,287.29 


8 

tSee a later lefeience on combining independent piobabilities. 


seveial prizes were awarded at the completioa of the testing 
Motivation appeared to be excellent. 

Two methodological issues now demand attention. First, 
in order to evaluate trait differences — differences in the sub¬ 
jects’ relative proficiency from test to test — the various tests 
themselves had to be equated. This was accomplished in the 
following fashion. The seven norm distributions of 216 cases 
each were checked for normality, and the live which departed 
from the criterion were normalized. The pertinent data are 
included in Tables 3 and 4. These distributions were then 
assigned comparable scores after the method described by 
FIull, McCall, et al.® Specifically, each distribution was as¬ 
signed a mean of 500 and a standard deviation of 100. The 
scores of the subjects in the present experimental group were 
converted to this form and evaluated in terms of these equated 


L Hull, “The Conveision of Test Scores into Series Which Shall Have 
Any Assigned Mean and Degree of Dispersion,” Journal of Applied Psy¬ 
chology, VI (1933), 298-300. 


303 










EDUCATIONAL AND PSYCHOLOGICAL MEASUEEMENT 


TABLID 3 

NORMS 





Tran.s- 



Test 

M 

0 

formation 

Value 

N 

Blnik Packhuj* 

2.73^ 

.064 

logarithm 

low score 

217 

T.S.V.*'** 

549 

83 


is good 


O.S.V.*” 

571 

87 




Slow Movement* 

1.518 

,349 

logarithm 

low score 

217 

T.S.V. 

22 

26 


is good 


O.S.V. 

41 

34 




Steadiness* 

2.382 

.500 

square loot 

high scoie 

217 

T.S.V. 

5.67 

2.38 


is good 


O.S.V. 

6.18 

2.48 





20.514 

.898 

square root 

high scoie 

217 

T S.V. 

421 

34 


is good 


OS.V. 

426 

37 




Slick Balancing* 

1.089 

.205 

square root 

high score 

216 

T.SV. 

15 

21 

of logarithm 

is good 


O.S.V. 

35 

72 




Sliced of 
Movement 

150.06 

30.138 

none 

high score 

217 

T.S.V. 

150.06 

(1 


is good 


O.S.V. 

150.06 

<( 





*= value in transfaimed distiibution. 

**T.S.V. = tiansfoirncd scoie value. 

**‘*0.8 V = oiiginal scoie value. 

norm distributions. Any tendency to minimize the magnitude 
of trait differences may thus be viewed as a function of the 
sampling error of a mean of 216 cases.® 

The second methodological consideration was the time- 
honored one of securing a zero point and equal units of meas¬ 
urement on the scale for the evaluation of Individual differ- 


'’A trial analysis of the scoies on the speed of movement teat was made 
using both the original and the transformed measures. The relative magnitudes 
of the respective factors were identical to two decimal places by the two 
methods. Apparently, none of the infoimation latent in the data is lost through 
the transformation. 


304 



INTRA-INDIVIDUAL DIFFERENCES 


TABLE 4 


TESTS OF NORMALITY 


Tests 

N 

Gi 

Ge 


"G 2 

P 

Block Packing ,, , . 

217 

0.315 

0.529 

0.1655 

0.3286 

>.01 

Slow Movement , . 

.217 

0 761 

0.845 

(( 

(( 

(( 

Speed of Movement 

.217 

0.312 

0.049 

il 

{{ 

(( 

Tapping . 

.217 

0.169 

0.356 

n 

(( 

(t 

Steadiness. 

.217 

0.362 

0.144 


U 

(t 

Stick Balancing... . 

.216 

0.040 

0.145 

(( 

1 f 

t< 

Card Sorting . 

, 219 

0.253 

0.017 


latei omitted 


Gi and G 2 aie calculated from R. A. Fisher’s K statistics. A com¬ 
plete description of the method is to be found in Goulden, C. H. 
Methods of Statistical Analysis (New York: John Wiley & Sons, 
1939), pp. 27-31. 

ences. This would, of course, be necessary in ordei to justify 
the ultimate pooling of the results AnastasP has pointed out 
that standard scores from scaled, or normal, distributions tend 
to yield such equal units of measurement. Also, the analysis 
of variance deals only with deviates or differences, and not 
with absolute magnitudes These two considerations seem to 
point to the adequacy of the present data and technique for 
the purpose in view. 

It would have been ideal to establish the normality of the 
distribution of trait differences within each individual in 
similar fashion. However, the number of traits measured was 
so small that this constituted a practical impossibility. HulP^ 
has stated that the distribution of trait differences appears to 
be a normal one. The data of the present study would affirm 
this opinion, although no conclusions may be based on the 
mspectional method employed. In any case, the error, if any, 
introduced by assuming the normality of the distribution of 
trait differences would be veiy slight. 

On the assumption that a satisfactory estimate of the rela¬ 
tive magnitudes of individual and trait differences might be 
obtained, an attempt was made to isolate certain other sources 

1°A Anaatasi, ‘'Praclice and Variability," Psychological Monographs, XIV 
(1933-34), No. S, 

L. Hull, “Variability in Amount of Different Traits Possessed by the 
Individual,” Journal of Educational Esychology, XVIII (1927), 97-106. 

305 







EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of intra-indlvidual variation. In the second type of analysis 
illustrated in Table 2, the repetitive variations factor (R.V.) 
is seen to be composed of differences between the means of 
administrations. A direct estimate of the magnitude of this 
factor was obtained, as in the case of trait differences, by 
determining its mean per cent contribution to the total varia¬ 
tion from each separate analysis of the second type.^^ How¬ 
ever, the differences between the means of the administrations 
may be viewed as being attributable to two distinct sources; 
first, learning; and second, random fluctuations in the indi¬ 
vidual’s efficiency from day to day m all functions, An attempt 
was made to differentiate between the two. 

Briefly, it was assumed that learning would be at a maxi¬ 
mum on administrations 2-4, and at a minimum and negli¬ 
gible level on administrations 6-8. The evidence for this 
assumption follows. (1) If the proposed dichotomy in ad¬ 
ministrations (2-4 vs. 6-8) is made, the repetitive varia¬ 
tions factor is significant in the second summary analysis of 
administrations 2-4, and is insignificant in the second sum¬ 
mary analysis of administrations 6-8, Table 5 gives the 
relevant data, and the probability (P) column illustrates the 
point. (2) Also to be noted in Table 5 is the fact that the 
repetitive variations mean square is only slightly larger than 
the error mean square for administrations six through eight, 
(3) The establishment of a common unit for the estimation 
of improvement makes it apparent that most learning is con¬ 
fined to administrations 2-5. The “t” test is generalized in 
Fisher’s concept of fiducial probability to yield an expression 
as to the magnitude which any difference must attain to be 
“significant” at any given level of probability^® Specifically, 
in the present instance, the fiducial limits at the 10 per cent 

l^One analyam for each subject in the expeiimental gioup; each one in the 
form illustrated in Table 2. It makes no essential difference in the lesults 
whether a per cent is obtained in each analysis and the mean of the series 
obtained, or whether the sums of squares and degrees of freedom aie totaled 
and one per cent computed from these “summary statistics” The latter method 
is piobably preferable for purposes of estimation 

r^R. A. Fisher, Statistical Methods for Research Workers (London' Oliver 
& Boyd, 1937) 


306 



INTRA-INDIVIDUAL DIFFERENCES 


TABLE 5 


REPETITIVE VARIATIONS 
LEARNING VS. RANDOM FLUCTUATIONS 


Factoi 

Degrees of 
Fieedom 

Sum of 
Squares 

Mean 

Square 

(ipu 

p 

% 



Administiations 2-4 




T.V 

255 

2,828,723.05 




100 

T.D. 

75 

2.441,890.33 

32,558.54 

18.68 

<.01 

83 

R.V. 

30 

125J90.68 

4,179.68 

2.40 

<.01 

3 

Err. 

150 

261,442.04 

1,742.95 



14 



Administrations 6-8 




T.V. 

255 

2,861,646.86 




100 

TD 

75 

2,552,178.81 

33,629.05 

18 22 

<.01 

85 

R.V. 

30 

62,656.82 

2,088.56 

1.13 

>.05 

0.3 

Eir 

150 

276,811.23 

1,845.41 



15 


*These aie summary statistics derived by totaling the sums of squares and 
degrees of freedom of the separate analyses. 


level were used as a qualitative, common unit for the measure¬ 
ment of improvement from administration 2 through admin¬ 
istration 8 Table 6 contains a summary on this point. 

It should be noted that “improvement” in each individual 
case is defined in terms of the amount of variation which may 
be viewed as “random.” In accordance with the previously 
stated hypothesis, it will be observed that there is only one 
exception to the rule that learning, if present, tends to be 
confined to the first five administrations.^^ In view of the 
evidence presented, It was assumed that the difference between 
the Initial (2-4) and final (6-8) magnitudes of the repetitive 
variations factor might furnish an estimate of the relative 
Importance of learning. 

Finally, it can be shown that the estimate of error, or in¬ 
teraction, in the second series of analyses may have as many 
as three experimental components. These are - (1) unreliabili¬ 
ties of measurement, presumably Inherent in the test; (2) 
trait variability, presumably inherent in the individual; and 
(3) differential rates of improvement within the individual 


seemed best t'o omit administiation 5 bec.iuse it appealed to be at the 
inflection point on the learning curve. At best, this method may tend to under¬ 
estimate slightly the role of learning 


307 





r.DUCAriONAL AND PSVTIIOLOGICAL MEASUREMENT 


TABLE 6 


I'TDUCIAI, UMITS OF LEARNING 


Indi¬ 

viduals 


Administration.s 

— Avetage of 6 

Tests 


Limits 

10% 

II 

III 

IV 

V 

VI 

VII 

VIII 

A 

519 

541 

559* 

576 

592 

594 

583 

54 

B 

573 

547 

564 

602** 

613 

602 

613** 

38 

C 

569 

582 

597 

636* 

621 

625 

611*'^ 

58 

D 

543 

581 

599 

635* 

662 

673 

732« 

74 

E 

493 

544» 

526 

626 

593 

569 

517« 

48 

F 

628 

648 

664 

667* 

684 

685 

673''* 

39 

G 

566 

612 

653* 

666 

677 

651 

684** 

49 

H 

521 

598*^ 

618 

613 

640 

725* 

666 

55 

I 

518 

561 

578* 

580 

591 

598 

583^ 

51 

J 

600 

612 

630 

633* 

631 

629 

655« 

32 

K 

537 

571 

562 

592* 

599 

626 

616** 

45 

L 

550 

605* 

589 

605 

620 

630 

652** 

45 

M 

541 

553 

556 

555** 

554 

563 

5691 * 

43 

N 

590 

596 

586 

610** 

627 

623 

632'‘ 

38 

0 

467 

481 

525* 

532 

564 

549 

650*^ 

50 


T== 

12/15 




T = 

1/15 



Key. ♦ = point at which difference from .icoic on administration 2 oi 6 becomes 
as great ae fiducial limits. 

**=:no difference as great as fiducial limits from admini.stiption 2 oi 6 
through given point. 

from function to function. (2) and (3) are, of course, sources 
of intra-individual variation. However, since it was not pos¬ 
sible in the present Instance to differentiate satisfactorily the 
various factors contributing to the estimate of error,^'’ it was 
thought most parsimonious to exclude them from considera¬ 
tion as separate entities assignable to either intra- or inter- 
individual sources. 

The Results —It should be stated at the outset that these 
results will be based on only seven administrations of each of 
the various tests, the first administrations being discarded as 
unreliable. They will also include only six tests in the primary 
analysis, since the noi-ms for the card sorting test were found 
to be unsatisfactory. 

Since the present sample is necessarily small, it Is interest¬ 
ing to note these evidences of its representativeness. First, the 


uiThe number of cases would be rather loo small to give the cuive-fitting 
methods much significance 


308 



INTRA-INDIVIDUAL DIFFERENCES 


average standard deviation of the experimental group was 
over 90 per cent as large as that of the unmatched and un¬ 
selected norm group. Second, the mean scores of the experi¬ 
mental group for administration number two are practically 
identical with those of the norm group. Three, the sample 
was split to allow an estimate of its internal consistency 
Table 7 shows the result. This sort of consistency is surely 
one evidence of representativeness. These facts, combined, 
suggest the adequacy of the sample. 

TABLE 7 


CONSISTENCy OF SAMPLE 


Factor 

G, 

G. 

I.D. 

69% 

75% Analysts I 

T.D. 

76 

77 Analysis II 

R.V. 

7 

7 

Err. 

17 

16 

(Average pioportiona of total 

variation from the analyses of both series) 

Gi and Gj =: 

lespective halves of sample. 


A fundamental assumption in the application of the 
analysis of variance is that experimental error is distributed 
with unifoim though unknown variance about a mean of zeio. 
Accordingly, Nayer’s test^® for homogeneity of variance was 
applied to these data to check this hypothesis. In all cases, the 
value of L failed to reach even the 5 per cent level, which 
means that the variances within groups are the same. Statis¬ 
tically, this result justifies the application of the proposed 
method to these data. 

Before turning to the results proper, it should be noted 
that they have a dual methodological aspect Actually two 
separate problems exist; one is a problem of determining 
significance, and the second a problem of estimating magni¬ 
tudes. The second problem ceases to exist if the first is not 
satisfied In accordance with this fact, the discussion of 
significance will precede that of estimation in what follows. 

P N Nayer, “An Investigation into the Application of Neyman and 
Pearson’s L Teat, witli Tables of Percentage Limits,” Staluiical Research 
Memoirs, I (1936), 38-56. 


309 




EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 
TABLE 8 

INDIVIDUAL DIFFERENCES—SUMMARY 


Factor 

Degrees of 
Frectlnm 

Sum of 
Squares 

Mean 

Square “F" P % 

T.V. 

618 

5,373,669.90 


LD. 

78 

8,773,633.72 

47.610.69 23.43 <01 12 

R.V. 

72 

709,019.91 

9,847.50 

Err, 

468 

951,016,27 

2,032.09 


TABLE 9 

TRAIT DIFFERENCES—SUMMARY 


Degrees of 

Sum of 

Mean 




Factor 

Freedom 

Squares 

Square 

"F” 

P 

o/o 

T.V. 

615 

7,148,910.67 





T.D. 

75 

5,488.873.-31 

73.581.62 

33 68 

<.01 

77 

R.V. 

90 

682,345.47 

7,581.62 

3.49 

<01 

7 

Eir. 

450 

977,691.67 

2,172.65 



16 


Tables 8 anti 9 show the summary statistics for the two 
major series of analyses. These statistics were derived by 
adding the sums of squares and degrees of freedom from the 
separate analyses of the type illustiated In Tables 1 and 2. It 
should be observed that each of the factors is significant above 
the 1 per cent level, and that the per cent column contains the 
best available estimate of the respective magnitudes These 
per cents are derived from the mean squares for the respective 
factors, minus the error mean square, and divided by a correc¬ 
tion for the number of scores contributing to the means in¬ 
volved. Irwin^'^ has, in general, described the method. 

A question may, however, be raised as to the validity of 
this procedure of adding sums of squares and degrees of 
freedom from the separate analyses of each series on the 
assumption that all the deviations are, in effect, grouped 
about a common grand mean. Although the tests were equated 

I'^O J, Irwin, "Mathematical Theorems Involved in the Analyses of Vari¬ 
ance,” Jotirnal of Royal Statistical Society, XCIV (1931), 284-300 (especially 
pp 293-296) 


310 




INTRA-INDIVIDUAL DIFFERENCES 


With this criticism in mind, the question may best be answered 
by demonstiating that the same result is obtained if a method 
demanding no such assumption is employed. 

First, it should be pointed out that the individual differ¬ 
ences and trait differences factors are highly significant in each 
of the separate analyses in which they occur. Their total 
would, therefoie, of necessity be highly significant. The repe¬ 
titive variations factor, howeveiy is not invariably significant 
in these separate analyses (cf. Table 2) Its total has, never¬ 
theless, been shown by the method of adding sums of squares 
and degrees of freedom to be significant above the 1 per cent 
level It is, then, this specific result which requires verifica¬ 
tion. Fisher^® has described a method appropriate for pooling 
the information from mutually exclusive though similar experi¬ 
ments His technique makes it possible to sum the independent 
probabilities which arise from independent experiments by 
utilizing the fact that the log of the probability to the base 
"e" is equal to minus one-half Chi-squared Two degrees of 
freedom aie allowed for each independent comparison or 
probability value; these degrees of freedom and the log values 
are additive. The total may be tested for significance directly 
by entering the Chi-squared tables with the appropriate num¬ 
ber of degrees of freedom. The hypothesis in the present 
instance is that the repetitive variations and error factors are 
from the same population. The obtained value of Chi-squared 
is highly significant and refutes this hypothesis as shown in 
Table 10. The table also shows that the results obtained by 
this method and by the method of totaling sums of squares 
and degrees of freedom are comparable, since by either pro¬ 
cedure the obtained probability value exceeds the 1 per cent 
level by approximately the same amount. 

Two problems, then, remain. The first concerns the de¬ 
termination of the significance of the difference between the 
mean magnitudes of individual and trait differences. The 
second relates to the relative importance of learning in the 

1®R. A. Fisher, Statistical Methods for Research Workers (London. Oliver 
and Boyd, 1936, sixth edition), 104-106. 

311 



RnrCATlONAI, AND PS\CI10L0GICAL MEASUREMENT 
TABLE 10 

COMPARISON OF METHODS 


V. _ 

. ir = 

3.+9* 

Err. 


ratio “ 2.5 

T/o - 

. F = 

1,42 \ 

XP‘s- 

- X- 

= 130.85 



ratio =■ 2.6 

1% ■ 

- X'-^ 

= 50.89 


methods give equivalent results 




^Values taken from R.V./Err. in Table 9. 


repetitive variations factor. The former probLera was handled 
in the following manner. The per cent of the total variation 
contributed by individual dittciences was determined for each 
test (cf. Table 1). The per cent of the total variation con¬ 
tributed by trait dittcrences was determined for each indi¬ 
vidual (cf, Table 2). These two series of per cents weie then 
tested for the significance of the difference between their 
means. Since per cents are distributed as a binomial or Poisson 
distribution, it was necessary to transform the original meas¬ 
ures before applying the test of significance.^ Fisher and 
Yates^** have constructed a table of the inverse sine function 
which transforms proportions or per cents to angular degrees 
and normalizes their distribution. P'isher’s test was ap¬ 
plied to these transformed values, and the obtained value of 
“t” was found to be insignificant at even the SO per cent level. 
It was, thus, assumed that individual differences and trait 
differences were of comparable magnitude. This view that the 
tests arc as discrete as the individuals—the specificity view of 
motor skills—is in substantial accord with the published con¬ 
clusions of such investigators as Perrin, Muscio, Seashore, 


raClark and Leonard have contributed an excellent discussion of this point 
Cf. A. Claik and W H. Leonard, "The Analysis of Vaiiance with Special 
Reference to Data Exptessed as Percentages," Journal of American Society of 
Affro7iomy,XXXl (1939), 55-66. 

A Fisher and F. Yates, Stailslical Tables for Biological, AgricuUuial, 
and Medical Research (London Oliver and Boyd, 1938), p 90, 

312 



INTRA-INDIVIDUAL DIFFERENCES 


Griffitts, and Buxton and Humphreys.’^ It likewise agrees 
with the conception of motor abilities propounded by the 
authors of the “Minnesota Mechanical Ability Tests These 
last make reference to the specificity view as “the theory of 
unique traits ’’ 

With respect to the latter problem, then, Table 9 shows 
that the repetitive variations factor accounts for approxi¬ 
mately 7 per cent of the total variation in the series of analyses 
relative to trait differences. Table 5 shows that the initial 
(2-4) magnitude of the repetitive variations factor is ap¬ 
proximately 10 times its final (6-8) magnitude. This sug¬ 
gests, as an estimate, that learning is at least 10 times as 
Important a source of variation as are “random” fluctuations 
in the Individual’s efficiency from day to day in all functions. 

Finally, it may be observed that if individual differences 
and trait differences are of comparable magnitude, and if the 
repetitive variations factor is of significant magnitude, then 
by definition intra-lndlvidual differences are greater than inter- 
individual differences. This fact was affiimed by determining 
the mean per cent contribution of individual differences to the 
total variation in the analyses of series one, and of trait dif¬ 
ferences plus repetitive variations to the total variation in the 
analyses of series two The two series of per cents were 
appropriately transformed via the inverse sine function and 
the “t” test was applied to determine the significance of the 
diffeience between their means. The obtained value, confirm¬ 
ing the hypothesis, was significant above the 1 per cent level. 

21F a. C Peinn, “An Experimental Study of Motoi Ability,” Journal of 
Experimental Psychology, IV (1921), 24-56. 

B. Muscio, “Motor Capacity with Special Rcfeience to Vocational Guid¬ 
ance,” Biitish Journal of Psychology, XIII (1922), 152-184 

R. H. Seashoie, “Individual Diffeiences in Motoi Skills," Journal of Gen- 
eial Psychology, III (1930), 38-66 

C H, Giiffitts, “A Study of Some Motor Ability Tests,” Journal of Applied 
Psychology, XV (1931), 109-125 

C, Buxton and L G. Humphreys, "The Effect of Practice Upon Inteicorre- 
lations of Motoi Skills,” Science, LXXXI (1935), 441-442. 

22d G Pateison, R M Elliott, et al. Minnesota Mechanical Ability Tests 
(Minneapolis Univeisity of Minnesota Press, 1930), p 586 

313 



Km'C.VnONAL AND rSYCIlULOGICAL MEASUREMENT 

The Conclmions -—In this population, and with respect to 
these functions, the following arc the conclusions of the 
present investigation 

(1) Intra-individual diifercnces were greater than inter-indi¬ 
vidual differences. 

(2) Individual differences and trait differences were of-com¬ 
parable magnitude. 

(3) Repetitive variations were of approximately one-seventh 
to one-eighth the magnitude of individual or trait differ¬ 
ences, 

(4) Learning accounted for at least 90 per cent of the varia¬ 
tion assigned to the repetitive variations factor. 


314 



NEW TESTS* 

Test for Machinists and Machine Operators^ by Joseph Tiffin, H F. 
Owen, C. C. Stevason, H, G. McComb, and C, D. Hume. 1942 
An achievement test of technical knowledge for machine shop opera¬ 
tions. Foi 12th grade through adult level. Time, appioximately 50 
minutes Machine or self-scoied. Price, 18c per copy; specimen set 
25c. Published by Science Research Associates, 1700 Piairie Avenue, 
Chicago, Illinois. 


The Purdue Pegboard, developed by the Pm due Research Foundation. 
1942. A test of manual dexterity and facility for small assembly 
woik. Foi high school through adult level. Time, two to four 
minutes. Piice, $9 75. Distributed by Science Research Associates, 
1700 Praine Avenue, Chicago, Illinois. 


Industrial Tiaining Glassification Test^ Forms A and B, by Charles 
Lawshe and A. C, Moutoux. 1942. Discriminates between indi¬ 
viduals likely to piofit from industrial training programs and those 
likely to fail. Foi 12th grade through adult level. Time, 35 min¬ 
utes Price, 6c per copy; specimen set 15c. Published by Science 
Research Associates, 1700 Prairie Avenue, Chicago, Illinois. 


Turse-Durost Shoithand Achievement Test, Form A, by Paul L. Turse 
and Walter N Durost. 1942. Areas sampled are shorthand prin¬ 
ciples, shorthand penmanship or outline proportions, punctuation, 
paragraphing, sentence structure, and spelling. For first and second 
year shorthand students. Time, approximately 50 minutes. Price, 
$1.10 per package of 25 tests; specimen set l5c. Published by the 
Woild Book Company, Yonkers-on-Hudson, New York. 


The Behavior Cards, by Ralph M Stogdill. 1941. Designed for use 
as individual test-inteiview with delinquent boys and girls. For 
ages 9 to 18. Time, 15 to 30 minutes. Price, $2.50 per complete 
set, including specially constructed box, 150 cards, 25 record sheets, 
and manual of diiections. Distributed by the Psychological Cor¬ 
poration, 522 Fifth Avenue, New York City. 


♦Prepared by Jane Gilbert. 


315 



EDITCATIONAI- AND PSYCIIOLOGTCAL MEASUREMENT 


Uxrd Health Alliludi’ Scale, by Olivei E. Byrd, 1941. Designed to 
measuie liealtli attitudes of the gioup or individual, For llth grade 
level thiough college sophomore level. Time, approximately 3Q 
minutes. Price, $1.75 per package of 25 tests. Published by Stan¬ 
ford University Press, Stanford University, California. 


Test on the Effects of IFar, by Lee J. Cronbach. 1942. A survey 
instrument, designed to study morale or confidence of high school 
youth. For high school students. Time, approximately 25 min¬ 
utes Price, Ic per re-usable test; 5c per answer sheet. Published 
by the Stale College of Washington, Pullman, Washington. 


TraxUr Silent Reading Test, Form 4, by Arthur E. Traxlei. Revised, 
1942. Includes rate of reading, story comprehension, word mean¬ 
ing, and power of comprehension. For grades 7 to 10. Time, ap¬ 
proximately 50 minutes Price, 7c per copy; specimen set 30c. 
Published hy the Public School Publishing Company, Bloomington, 
Illinois. 


316 



MEASUREMENT ABSTRACTS'^ 


Ackeiman, Doiothy S. “The Ciitical Evaluation of the Viennese Tests 
as Applied to 200 New Yoik Infants Six to Twelve Months Old.” 
Child Development, XIII (1942), 41-53. 

Biihlei's Viennese Tests for the measuiement of development of 
infants were given to 200 infants for the purpose of evaluating and 
validating them for use with American children. Representative groups 
of subjects were used. The proceduie follorved was that standardized 
for the tests The average developmental quotient score was 106.67 
as compared with a score of 100 obtained by Buhlei foi Viennese chil- 
dien. Split-test reliability coefficients langing from .92 to 98, for the 
different age groups, were obtained. Suggestions for revising some of 
the items are made, but, on the whole, the test is consideied to be a 
valuable, practical instrument for estimating the development of in¬ 
fants. L Bouthilet. 


Berger, A. “Test Construction and I.Q Constancy.” Journal of Ex¬ 
ceptional Children, Ylll (1942), 109-111. 

Although much attention has been given to the effect of such factors 
as changes of environment, schooling, and glandular therapy upon I. Q. 
constancy, little emphasis has been placed upon the defects in the tests 
themselves as a source of inconstancy of the I. Q. This paper discusses 
some of the causes of I. Q. fluctuation. Among those listed are the 
fact that the I. Q. varies according to the particular test used to meas¬ 
ure It, that the same test given at different age levels may involve the 
use of entiiely different types of items, and that the variability of the 
groups upon which the tests were standardized may have been different. 
L. Bouthilet. 


Berger, Arthur and Speevack, Morns “An Analysis of the Range of 
Testing and Scattering Among Retarded Children on Form M of 
the Revised Stanford-Binet Scale ” Journal of Educational Psy- 
chology, XXXIII (1942), 72-75. 

The authors have found that a large percentage of retaided pupils 
increase their scores on the average 3,14 months of mental age when 
the tests are extended. The rhyme, digits forward and reversed, word 
naming, sentence memory (year XI), response to picture (Messenger 
Boy), and problems of fact are among the items most frequently passed 
beyond the first zeio point. Frequent passing of certain items after a 


^Edited by Fonest A. Kingsbury. 


317 



KDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


year's level of complete failures indicates the possibility of inadequate 
scalinR for these items. It is sugRestcd that the test should be extended 
at least to the point where two levels of failuies have been reached, if it 
is to be an adetjuate measure in the clinical examination of retarded chil¬ 
dren. Louise T. Grossnickle, 


Burt, C. 'I'he Factors of the Mind: An Intt oduction to Factor Anal¬ 
ysis in Psycholoffy. London, Univ. London Press. 1940 pp. xlv-h 
509. 

The Factors of the Mind reviews the field of factoi analysis, par¬ 
ticularly the English versions, 

The logical methods rather than the results of factor analysis are 
discussed in the fiist section. The primaiy object of factorial methods 
is neither interpretation, winch was Spearman’s original concern, noi 
statistical prediction, which w'as Thomson’s original concern. The ob¬ 
ject is description. "Mathematically, a factor is simply an average. .. 
of certain measuiements empirically obtained. Logically, it is simply 
a principle of classification—a principle by which both tests (oi tiaits) 
and the persons tested may be classified ’’ 

The section following describes the similarities among the various 
types of factor techniciues. The last section is "an actual application 
of . . . the problem of temperamental types.’’ The inverted factor 
technique is given its most complete review in this section. An appendix 
contains woiking methods and tables for computers. Helen Wolfie. 


Canady, H. G., Buxton, C. and Gilliland, A. R. "A Scale for the 
Measurement of the Social Environment of Negro Youth,’’ Journal 
of Nepo Education, XI (1942), 4-13. 

Seventeen environmental factors (social contacts, cultural, educa¬ 
tional, home, etc.) considered by competent judges important lor the 
mental development of Negro youth of high-school age are incorporated 
into an hour’s interview’. The subject’s response on each factor is rated 
by the interviewer from 1 to 5 with the aid of a scoring key, yielding 
a total possible score ranging from 17 to 85 The Environmental Inven- 
toiy items are less of the socio-economic type than those in the Sims 
scale (with which it correlates .73 ± 04); and it has some relation 
(r=.32±:.06) to intelligence, as compared with the Sims scale 
correlation with intelligence, which was found to be .16 ± .05. 
F. A. Kingsbury. 


Carter, H. D. "How Reliable ate the Common Measures of Difficulty 
and Validity of Objective Test Items?’’ Journal of Psychology, 

Xin (1942), 31-39 


318 



MEASUREMENT ABSTRACTS 


Subject-matter tests taken by 200 psychology students were analyzed 
to determine the relative reliability of vaiious measuies upon which 
item selection may be based. Results indicated that accuiate measures 
of item difficulty may be obtained fiom a lepresentative group of as few 
as 25 students. The common measure of the powei of test items to 
disciiminate between good and poor students yielded a reliability coeffi¬ 
cient of 46. The author concludes that a test may be improved more 
easily by basing selection on a measme of difficulty than on a measuie 
of disciimination powei. L. Birdsall. 


Crissy, William J. E. and Flanagan, John C. “A Plan for Using 
Punched Cards in Presenting Test Results in Profile Form” Jour¬ 
nal of Applied Psycholoffi/j XXVI (1942), 94-105. 

The importance of keeping test results in profile form is urged by 
psychologists, counselors, and personnel officers. The authors have 
developed a method for graphic presentation of test results of the Na¬ 
tional Teacher Examinations of the Amencan Council on Education, 
including a maximum of fifteen scores for each examinee. Scoies on the 
tests used aie leported on a common scale on which a specific scoie 
indicates a comparable degree of excellence foi any one of the various 
tests. They aie reported quickly, inexpensively, and in convenient 
form for permanent filing. The procedures for punching and inter¬ 
preting the profile card are given in the appendix. K. S. Yum. 


Dearboin, Waltei F. and Rothney, John W. M. Predicting the Child's 
Development. Cambridge, Sci-Art Publishers 1941. pp. 360. 

This lepoit is based on the Haivaid Growth Study, an investi¬ 
gation of physical and mental growth. Numerous tests and statistical 
procedures have been applied in an effort to determine constancy or 
variability of growth in intelligence, educational achievement, body size, 
ossification, and other characteristics. Jane Gilbert. 


Deemei, Walter L. “A Method of Estimating Accuiacy of Test 
Scoring.” PsychometnkajVll (1942), 65-73. 

When errois of test scoring obey a Poisson frequency law (theo¬ 
retical considerations suggest that they do), the method descilbed may 
be used for finding the upper fiducial limits of scoring errors per paper. 
A ciiterion is suggested for establishing toleiance limits on scoring er- 
lors, and a method is given (1) for finding the piobability of being 
wiong in the statement that the tolerance limit is being met for a given 

319 



LJ)UCATI()N.\r, AND PSyt’lI01,(JG[CAL MEASUREMENT 


hize sample or (2) for lindlnK the size of sample that will make this 
piohability not p;icater than some fixed value (Couitesy Fsfeho- 
meiiiha.) 


Dodd, Sttiait Caitei, Dinniisiuin nf Saticiy. New Ycuk, Macmillan 
19+2. pp. 944, 

'I'liis book picseiit.s a niatluMnatival .ippioach to .society and lepie- 
sents an attempt to sj stcniatize statistical foi ms and data relative to 
society. 'Die theory upon which this study is based is as follows “Any 
quantitatively lecorded societal situation (S) can be cxpiessed as a com¬ 
bination of time (T), space (L), a human population (P), and indi- 
catQis (I) of their chaiacteiistics . . each type analyzable into a speci¬ 
fied number of iiulices each opeiationally developed by its exponents 
and each subdivided into a specified numhei of class inteivals and furthei 
subdivided into a specified tnimlier of cases.” While the field of soci- 
oloKy is emphasized in this piescntation, tlie methodology should be 
applicable to all quantifiable data in each of the social sciences. Jane 
Gilbert. 


Ezekiel, Moulecai, Methods of Correlation /Inalyui, New Yoik, 
John Wiley and Sons, 1941. pp. 531. 

Although this hook treats statistical piocecluies laigcly fiom the 
economic point of view, it should be of gencial inteiest to measuiement 
workers. It does not covci the entire field of statistics; lather, it deals 
with the types of relationships between variables, The authoi has at¬ 
tempted to biing up to date the interpietation of standaid eriois and 
to point out the application of the logical limitations to giaphic curve 
flexibility. New and speedier methods of calculation and methods of 
estimating reliability of individual estimates aie also piesenled. Jme 
Gilbert. 


Feiguson, George A. "Item Selection by the Constant Piocess.” Psy- 
chometrika,Vll (1942), 19-29. 

This papei relates the constant piocess used in psychophysics to 
the pioblem of item selection. Each test item may be desciibed in 
teims of a limen, whieli is an index of the point at which an item dis- 
ciiminates, and the standard deviation of the limen, which is an index 
of the "goodness” of discrimination. The method developed may be 
related not only to the description of items but also to the description 
of persons. Thus a peison’s ability may be desciibed in teims of a 
limen and its standaid deviation (Couitesy Psychometrika.) 

320 



MEASUREMENT ABSTRACTS 


Gillette, Annette L “Relative Difficulty of Tests Within Each Year 
Level of Revised Stanfoid-Binet, Foim L, Yeais Six Through 
Trvelve.’’ Journal of Psychologyj Ull (1941), 125-138, 

“The data (fiom 506 cases) cleaily indicate that within year levels 
theie are vaiiations in the difficulty of tests as measuied by the per¬ 
centage passing . . The tables indicate the differences in difficulty 
of tests within levels and the leliability of these diffeiences.” Each of 
the 42 tests is named and numbered and placed in order of pei cent 
passing of the total group The tables will be of great value to the 
clinician. Helen M. Wolfle. 


Greene, Hairy A., Jorgenson, Albert N., and Gerbench, J. Raymond. 

Measni ewent and Evaluation in the Elementary School. New Yoik, 

Longmans, Gieen, and Company. 1942. pp 639. 

This book has been designed as a handbook of measurement for ele- 
mentaiy school teacheis and students of elementary education. Par¬ 
ticular attention has been given to the problems involved in the con¬ 
struction, use, improvement, and inierpietation of teacher-made exami¬ 
nations and tests. Important changes and trends in curriculum oigani- 
zation, instructional techniques, and in measurement and evaluation 
techniques have been incoipointed in this edition, whicli is a revision 
of an eailiei text. 

The authors discuss types of educational and mental tests, theciiteria 
of a good examination, construction and use of various types of standard¬ 
ized tests, the nature and use of intelligence and personality tests, meas- 
uiement and lemediation in specific academic areas, and finally, the use 
of test lesulls for guidance purposes. While the book is directed at the 
elementaiy level, it should also be of general inleiest to measurement 
woikeis and teachers at all levels, paiticulaily with reference to the dis¬ 
cussions on test construction and standardization. Jane Gilbert. 


Grossnickle, Louise T. “The Scaling of Test Scores by the Method 
of Paired Comparisons.” Fsychcmetrikaj VII (1942), 43-64 

The purpose of this study is to investigate, by the method of paired 
comparisons, a possible scaling of individuals who have made certain 
test scores, such that the additive piopeity will be satisfied and such that 
a stability in scaling will be maintained—in othei words, a scaling such 
that the scaled score of an individual will remain relatively the same 
regardless of the grouping of individuals in which he may be placed. 
The results show that it is possible to utilize psychophysical methods 
in psychological and educational test situations Among the major find¬ 
ings aie that Case V of the Law of Comparative Judgment is appli- 

321 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


cable to the data in this problem, the method of dividing the inter¬ 
mediate categniy equally between the gieater and the less was the best 
of three passible methods, internal consistency was satisfied, and, finally 
when a new test of stability was applied, it was found that the distances 
between tlic hypothetical individuals remain the same, (Courtesy 
Psi>Lhometi ika.) 


Guest, L. P. "Last vs. Usual Purchase Questions.” Journal of dt- 

plied Psychology, XXVI (1942), 180-86. 

The use of tlic questionnaire in maiket researeh has led to increased 
Inteiest in the problem of the form of tlie question to enable the respon¬ 
dent to answer as easily and coriectly as possible. The problem is to 
determine the diffeience between the two forms, "last purchase” and 
"usual purchase," in tlic questionnaiie. The writer had two groups 
consisting of 438 college students, representing "last purchase” and 
"usual purchase” gioups. Each student was asked to answer a question¬ 
naiie of 24 questions. The results show that the two questions give 
comparable answers for the most pait when the results are treated for 
groups rathei than individuals. In measuring the brand pieferences, 
trends could be established equally well by either one of the two foims. 
A'. S. Yum. 


Guilford, J. P. "A Simple Scoring Weight for Test Items and Its 
Reliability.” PsychomettikajYl (1941), 367-74. 

It is pointed out that the scoiing weights for test items should be 
approximations to regression-equation weights. For this reason any 
estimate of reliability of the weight should not be permitted to influence 
the size of the weight but should be used in determining the limit of 
acceptability of an item. A simple approximation weight is recom¬ 
mended for general use, and an abac is provided for the estimation of it 
when the correlation between item and criterion is the phi coefficient. 
A foiraula for the standard error of this weight is derived and tables 
of significant and very significant weights are presented in terms of de¬ 
viation fiom the median weight. (Courtesy Psychonietrika.) 


Helson, Harry. "Multiple-Variable Analysis of Factors Affecting Light¬ 
ness and Saturation.” American Journal of Psychology, LV (1942), 
46-57. 

Factors affecting judgments of lightness (brightness) and saturation 
were evaluated through the use of analysis of variance. Judgments were 
made on an eleven-point scale running from zero to 10 for each attribute. 

322 



MEASUREMENT ABSTRACTS 


All computations are shown and explained in detail. Judgments of satu¬ 
ration were significantly affected by background (white, gray, or black), 
intensity of illumination, hue, and the interaction of hue and background. 
Background was most important. Judgments of lightness were sig¬ 
nificantly affected by background, intensity of illumination, and hue. Il¬ 
lumination was most important. Helen M. Wolfle. 


Holliday, Frank. '“A Survey of an Investigation into the Selection 
of Appientices for the Engineering Industry.” Occupational Psy¬ 
chology, XVI (1942), 1-19. 

The use of a battery of intelligence and aptitude tests improved the 
selection of English trade and engineer apprentices. Impiovement was 
shown by a decreasing numbei of failures on national examinations, by 
foremen’s satisfaction with the greater aptitude of their new apprentices, 
and by studies of the coi relations between test scores and later success. 
Intelligence scores correlated with latei success in mathematics, and 
aptitude scores with success in drawing High intelligence scores alone 
were insufficient in predicting either the good trade or the good engineer 
apprentices, Helen M, Wolfle, 


Holzinger, Kail J and Haiman, Hariy H. Factor Analysis Chicago, 
Univ, Chicago Press. 1941. pp. 417. 

This book has been written to present the various approaches to 
the problem of factor analysis. The analytic and geometric bases for 
factor analysis aie discussed as well as the theoretical development of 
various types of solution. Numerous practical illustrations are cited 
together with complete calculations. Jane Gilbert. 


Jurgensen, Clifford E. “A Two-Dimensional Rating Scale.” American 

Journal of Psychology, LV (1942), 255-60. 

A two-dimensional rating scale developed for use in a boys’ camp 
consists of ten traits or questions, each of which forms a scale represent¬ 
ing five types of behavior. The first and fifth are apparently two oppo¬ 
site terms descriptive of the same trait; the middle or third is noimal 
or average; and the second and fourth supposedly fall between the aver¬ 
age and the extremes. The second dimension indicates the frequency 
of each type of behavior in terms of seven different degrees, such as 
constantly, almost always, usually, frequently, sometimes, hardly ever, 
and never. Administiation of the scale and the scoring system are de¬ 
scribed. K. S. Yum. 


323 



rni fvnuvAi. a\ii rsuMioiocju’Ai, measurement 


K.iU’,, n.iiin‘1. Tasks in tlie Measurement of Public 

dpiniiin." Jniinml 0 / (Umsuliiny I’syi luilayy, VI (1942), 59-65 

Polliriji (ipiiiiiin is useful us haekniuund for any successful campaign 
tui inllueiuinji people. In f.ict, it is basic to the demociatic piocess, 
In addition to such puiclical utihtc, it is nnpmtant in the development 
of the science of social ps_\cholo|'v. Many of the aiRnificant problems 
of social psu'lifiloRt which are diflicult to handle in the laboratory can 
he piofitahh ap|iioachcd throuRli lire lield study whicli ascertains atti¬ 
tudes arid opinions. ’Die author lecieus the existiiiR niganizations and 
aRcncics as well as tlieii t\pe of woilc, .ind describes fundamental train- 
inn ‘‘tntl eijuipment for this field of puidic seivicc. Louise T Gioss- 
uickle. 


Kent, G. H. ''EmeiRcncv Hattciy of One-Mmute Tests.” Jouiml 
o[ Psyclwhyy. XIII (1942), 14M57. 

A hatteiy of hiief tests is Riven, suitable foi use as a preliminary 
measuie in psichiatiic evammations or under conditions in which piesen- 
tation of lonRei, nune foimal tests is not feasible. I'ivc oral tests and 
seven wiittcn tests ate desoiihed. Some of the tests have not been 
st.mdaidi/cd, and otheis are levised fnmis of pieviously published tests. 
'Die value of the tests foi use in the militaiy situation is emphasized. 
L. Ilniiiliili’t 


Link, Ileruy C,, and I'leibutR, A. I). ‘‘The Pioblcm of Validity vs. 

Reliability in Puldic Opinion Polls.” Public Opiinn?i Quariedy, 
VI (1942), 87-98. 

In spite of the fact tliat public opinum prdls have attained the status 
of a .scicntillc instrument and issues of national and international im¬ 
portance are beiiiR considered witli lefeience to poll results, their use and 
interpietation are subject to error. In ordei to make possible an eval¬ 
uation of reliability, a statement of the size and distribution of the 
sample of population interviewed should accompany each poll. High 
reliability, however, docs not insure validity One important check 
on the dangerous tendency to accept poll lesults uncritically has been 
theii validation by periodic elections returns. Validations by compari¬ 
son with specific puichasinR behavioi is also feasible. Questions on pub¬ 
lic attitudes and action should be framed in specific and behavioral teims 
lather than in general, stereotyped lanRuage. A discussion of vaiious 
other practical techniques of validation is given, with the conclusion 
that the basic criterion of validity is behavior. L. Poutlulet. 


Marble, Samuel D, “A Performance Basis for Employee Evaluation.” 
Personjiel XVIII (1942), 217-226 

324 



MEASUREMENT ABSTRACTS 


Bettei efficiency ratings can be seemed fiom lating scales when 
theii items deal 'vvith actual behavioi on the job lathci than with person¬ 
ality traits. After the descriptions of the job items are secuied, they are 
evaluated. The lelative importance of each behavioi item to the job 
in question can be obtained by tlie p.sycho-physical method of equal- 
appealing Intel vals Only items on which theie is agreement among the 
judges aie included in the final scale. Such a scale encouiages the 
supeivising officei to distinguish between tlic desciiptive and evaluative 
function, and makes his task moie palatable. Helen M. Wolfle. 


Maishall, M. V. “A Study of the Stanfoid Scientific Aptitude Test.” 
Occupatiom, XX (19-12), 433-434 

The test was administeied to 47 students at the end of their soplio- 
moie yeai oi the beginning of the junior yeai. Scores weie then coi- 
1 elated, by the product-moment method, with the average science grade 
in the freshman and sophomore j^cais, aveiage science giade in the junior 
and senioi yeais, and the average chemistty, physics, and biology grades 
for all four yeais, lespectively. Twenty-five students took the test twice, 
once at the end of the sophomoie year and again duiing the senioi year. 
The results show that the test possesses high lehability but rather low 
validity. The author feels, theiefoie, that its practical utility with college 
students foi the puipose of vocational guidance is open to question. 
K. S. Yum. 


McNemar, Quinn. “On the Numbei of Factois.” Ps\/chometrika, 
VII (1942), 9-18. 

A pioposed criteiion foi the numbei of factors is developed on the 
basis of the slmilaiity between a factorial residual and the paitial cor- 
lelation coefficient, something is known conceining the sampling erioi 
of the latter. Instead of computing the residuals as paitials, a formula 
IS presented foi adjusting the standaid deviation of the distiibution of 
residuals so as to appioximate the S.D of the residuals as partial cor¬ 
relations The ciitenon lequires that factois be extracted until the 
adjusted S D. leaches or falls below 1/VN. When tiled out on six 
samples drawn fiom six universes of known factoiial description, the 
cnteiion indicated the correct number of factois each time. The lequi- 
sites of situations adequate for such empirical checks are discussed 
(Courtesy Psychoinetmka.) 


McQuitty, Louis L. “Conditions Affecting the Validity of Peisonal- 
ity Inventories I, II; III." Jotnnnl of Social Psychology, XV 
(1942), 32-52. 


325 



lUlUCATIONM. AND PSYCIKJl.OGICAI, MEASUREMENT 


'riiesc three nrticles deal with conditions affecting the validity of 
personality inventories. Tire method of the study is to compaie ceitain 
conditions affecting personality inventniies with analogous conditions 
for intelligence tests as to nature of test items, content, diiections to 
suhjccts, and the .scoring of item responses; as to techniques of test con- 
stiaction, interrelations of item scorc.s or answers, the selection and 
elimination of items, and scaling and scoring; finally, as to the natuie 
of individual differcncc.s as induenccd by both hereditary and environ¬ 
mental conditions. The author suggests possible ways of incieasing the 
validity of the personality invcntoiics, K. S Yum. 


Owens, W. A., Jr. "A New Technic in Studying the Effects of Puc- 
tice upon Individual Differcncc.s.” Journal of Expeiimental Psy- 
cholosy, XXX (1942), 180-18.3 

R, A. Fisher’s analysis of vaiiancc is suggested as a technique for 
obtaining an estimate of the effects of piactice upon individual differ¬ 
ences. The technique was applied to a study of motoi skill tests, with 
individual differences and test administrations used as critciia of classi¬ 
fication. The results show a tendency for individual diffeiences to 
inciease slightly, hut statistically insignificantly, with piactice. This ten¬ 
dency to lemain constant sviggests the importance of the initial selection 
program in industry. Genrye If' BoyiiJiivsky. 


Reid, Scerley, "Respondents and Non-iespondents to Mail Question¬ 
naires.” Educational Research Bulletin, XXI (1942), 87-96. 

The accuracy of mail (|ucstionnaire lesults is difficult to estimate 
because of partial responses. In a study of the use of radios in Ohio 
schools an analysis was made of the diffeience in replies between first 
lespondents, those responding to a follow-up letter, and a sample of the 
remaining group who responded only aftei intensive persuasion. Sta¬ 
tistically significant differences were found between the groups, demon¬ 
strating that if the replies of the first group, or even the first and second 
groups together, had been used, erroneous and inaccuiate conclusions 
wmuld have ensued. Implications of the study foi other investigations 
of the same type are that follow-up methods are necessaiy, that a rep¬ 
resentative sample of the non-respondents may be used to indicate the 
trend of tlieir answers, and that in cases in which a follow-up question¬ 
naire cannot be employed, the possibility of error must be recognized. 
L. Bo7iihilet. 


Rodeheayer, Newton and Grim, Paul R. "Tests in Civics and Citi¬ 
zenship, Part II.” Social Education, VI (1942), 222-224. 

326 



MEASUREMENT ABSTRACTS 


This is the second installment of a bibliography of tests of various 
aspects of knowledge and attitude in the field of goveinment. General 
headings include tests on the Declaration of Independence, the United 
States Constitution, community affaiis, cunent affairs, and attitudes 
and beliefs. The objectives of the test, school giades for which it is 
suited, and a critical comment accompany each title. L. Bouthilet. 


Slater, P. “Notes on Testing Groups of Young Children,” Occupa¬ 
tional Psychology, XVI (1942), 31-38 

The basic principle of securing rappoit and constancy of testing 
conditions among different groups of subjects, especially among young 
children of diffeient ages, is a very impoitant one. The author is par¬ 
ticularly conceincd with some of the conditions for the administiation 
of the N.I.I.P. Group Test 70 on groups of children who ate 11, but 
not yet 12 yeais old, and who are 13, but not yet 14 years old, respec¬ 
tively. In administeiing the test, the psychologist should consider the 
particular age gioup lie is testing to secure the psychological condition 
of clear understanding and to meet the types of difficulty that are likely 
to arise. Louise Grossnickle. 


Swineford, Frances. "Some Comparisons of the Multiple-Factor and 
the Bi-Factor Methods of Analysis.” Psychometrika, VI (1941), 
375-82. 

Bi-factor and multiple-factoi anal 3 'ses of the same data are com¬ 
pared in two respects. First, two ciiteria are suggested for determining 
when the factorization is adequate. This problem being more acute 
for the centroid method than for the bi-factor method, the latter is used 
primarily for comparison only. It is shown also that the omission from 
the simple stiucture of entries smaller than .10 yields a pattern which is 
a poorer fit to the original couelations than is the bi-factor pattern. Sec¬ 
ond, the second-older general factor obtained from the intercorrelations 
of the primaries is found to be highly correlated with the general factor 
of the bi-factoi pattern {Comtesy Psychometrika.) 


Toops, Herbert A. “Code Numbers as a Means of Scoring Group- 
Administered Performance Test Products.” Journal of Applied 
Psychology, XXVI (1942), 136-50. 

327 



hDI'CAriuNAI. AM) i'SYCHULOCJR'AL MEASUREMENT 


AiiiimR tlie cliief olistaclcs tn ilic establishment of an adequate per- 
fnniiancf test pni( 4 iam fin guidance are tlie time and skill involved in 
the cost of scoring, and the delay bctuccn subsequent tests administeied 
usiiijf the same equipment. In view of tile fact that all mechanical 
performance test piuducts luue as a common featuic space aiiangements 
of movable sub-parts of a whole, and consequently exist in only a lim¬ 
ited tuiinher of wavs nr patterns of conect and paitlally eoirect prod¬ 
ucts, the aiillnir .su;'m"'ts llie cmplojinent of "Addends” as a means of 
(luick and certain identifiiation of such peifoimances. He shows in 
(letail liow to .ipply thi.s addend piinciple hy illustrations of fish-pole 
assembly and a holt-arid-vi asher assembh. K. S. Yum. 


328 



EDUCATIONAL AND PSYCHOLOGICAL 

MEASUREMENT 


Volume II OCTOBER, 1942 Number 4 


Two Announcements.331 

Aptitude Tests for Army Weather Observer Students .,. .335 


Earle Cleveland, Richmd TV. Faubion, and 
Thomas TV. Harrell 

The Optimum Use OP Test Data. . .339 

Many ice Lorr and Ralph K. Meister 

A Technique for Testing Understanding of the Visual 

Arts.349 

Melvin TV Barnes 


Some of the Less Measurable Outcomes of Education ... ,353 

Edwin f. Blown 

The Aims, Objectives, and Outcomes of the Ohio Testing 

Program.361 

Ray G. T'Food 

Educational Requirements and Occupational Levels. 371 

Richaid D Allen and Lesley F, Krone 

The Prediction of Success of Student Assistants in 

College Library Work. 379 

Grace M. Oberhehn 

The Administration of Group Tests.387 

Ernest M. Ligon 

The Purpose, Origin, Plan of Procedure, and Values of 

THE Nation-Wide Every Pupil Scholarship Tests.401 

H. E. Schraynmel 

A Test for Selecting and Training Industrial Typists. .. .409 


Clifford E, Jurgensen 

Measurement Abstracts .427 

Index for Volume II. .iu 













Copyright, 1942, by 

SCIENCE RESEARCH ASSOCIATES 


BTATEMENT OF THE aWNEHBHIP. MANAGEMENT. CIRCULATION, ETC, REUUIreD BY THE 
ACTS or COHORESa OF AUGUBT 24 , 11)12, AND MARCH 3, 1D33 
III IJlIl'i'A'riONAL AMI I'HYI'IUIUHIU'AL MKVhl UEMLM', 

Uulilklitil Ouitlttly «L Uilcanu, III , lur Udjuktr 1, lUti 
hbto i4 IlllniiU I 

Coutlis ot l>,n>lc ) 

Ufluit me, > Sr.larj I'liWlo, In anil lur lliis hlalu >ml (Oiinlj' nlorMalil, ncrBonnlli' ipiiunoa Wilier 
A Hj'iiimii. ulio, liiiuiKt liwii iliUy uiuiiii u riicillni! lu liiii ilcliuici iilnl wju Uiui lis Is tlit Eiisliwss 
MtiiaKui til tliu IMutaiiiitijI and P>>tlitili>t|liul Mi JMiniiii'iil ami III,it Lint fullimlni! ia to tint boat ol 
lili liiiiiulrilec luiil lirllcl. It iriiii alali'iii(iia nl tlio i»invri,lil|i iiiaiinEinni;iit (mul II a dally pipor, tlio 
rlrrulallunl, Mr, iil tins alurMtlil imlillmiun tnr tliu iliitu kliitttn in ilifi alioto Mctlon, rmiulteit Iw tiiu 
Art til Augiial 111, Idle. M aiiiriiiliM liy llin Alt iif MarJi 0, lU.I'), ciiilioitlril In luctloii 337, Postal Uvi 
anti lIciiiililMi, [irlnliit IM tlio mrrto ol llili lurin, lii-Hlt 

1 Tlul lilt nsnirt awl aililroinm ul thn piililUlirr, inUliir, niiinaiilni! editor, imtl tnislnosi miijiaBOra 
ars I’lililhlirr, Kclriut! Hrsciitilt Aatwliilca, I7li(l I'talrlo Avciura, UliliuEo: Edttoi, G i'redcrto Kiidur, 
null I'raltlo Atrniio, I'lilraKul Manattliiu IMIlor, Jiilm It Valo, 1700 I'ralrln Amiiiuo, Clilcago, Biistncsg 
Maiiafor, IVnller A Hyimiiia. 170111’rairlo Aioiiuu, I'lilciita, 

2 ITiut lliu minor la Ilf minnl liv a niriKirailun Ita namti and atldrcai must lio stated end also 

liiitimllilrl)' llloreiimkr Ilii) iininM ami aililrciBca «f alm'ldioiilru onnInB oi lioldlnir ono per ceat or aioro 

1)1 bilal amount ol itmlt II iint nnnrd lij' a rnrimralliin, tlio names and aildicisci ol tlio Indlrlduol 

twncra imiit lio iHttii If (Mned tiy a Arm, rwuiany, nr olliw tmliiroiunrated contotn, ita niino and 

address, aa itoll as llmad 01 ohiIi Indit'itliinl iiirinliiT, niiisl lio elirn I Italiili A Haul, 20B LaBallo 

HI., Cldraiio, 111., Cliiirlci H Iliij'il, A|i|itrimi I'lutlnl Palter Cn Amiloton, Wli , It \V Qlasnor, 04110 

TV tfitli Hi, t'lilcaiio, HI., Allrnl M llamlll, 203 H LaHalla HI, Olilcaito 111; Iloliort 0 IfcNanata, 

ii23 H Waliaah Ate llltlcajii, 111 ; Joint I islinn, 133 H I.nHiilIu Hi, (Vilraan, Ill, I,ylei M Suoactr, 
1700 ITnIrla Ate,, Clilraiiii, III., Mrs liiirnllir Ilnrtl, n/o Hot 1) Ilaril 1114 .S LnSallo HI., ClilcDto, 
ill I ll«) B, llaril, 131 H, LaNallit St, t'lilrak'it. III,, (Itiirao .M, Durtl II, c/o Ilnliili A Hard, 208 5/ 

i.aHallo Hi,, ('lileaao. 111., Mlaa Janel Hard, eio lliliilt A Hard 200 H BaHnllo HI, CMcaOo, 111, 

Hobarl K Iluma, 1700 Pralrlo Ate,, ITilraan 111., Mlia Ururo M Wagnoi, c/t> Hlcltard Wtaner. 
135 S, IiaSallO HI., ClikaKii, III I W. (’ Wlnkel, c/o Modlno ittK Co, Hmjiio, Wla 

S Tliat llio known liomlliolilrra, niorlRagoi's, ami oUier irtnirlty Imldora ownlne or lioldlas 1 pot 
rent or noro ol total amount uf litmtla, munkURcj, or iliiilr aeuiirltloa are (II tlicro ata nano, so slate I 
Nona, 


4 Tliol U\a Iwn parastatilta ant altovo. altlna litti nnnipa of tlio ownrra, atockliolilcra, and amrlts 
lioltlan, II uiy, contain not only tlio Hal of iiorkliolilcra and aeciirlty liulilera as tlicy appear upon tlio 
liDohi of tint company lull alaO, In caaea wlicrn tlio atorkliolilCr Or accurlly liolUor niinonr! upon tim books 
ot tlio company aa Ituilco or In any oilier fliluidary rolallon, llio name of tlio person or corporation tor 
tvliotn Bueli Irtialed la acllnB la niton: also that llio oaW Wo patnurniilie eontatn atatcuionta ouiliraclJiB 
udiant'i full knotiloilBO and licllof aa to llio rlrninialaneca and condltlona iimlor wlilrli stoelitioldori and 
BMiirlty lioldora iilio do iiol appwr upon llio booka of tlio rompany ni truslooa, bold atock and irarltlos 
In a cnpaolts niaior than thal ot a liona lido owner, end ihli nfllant hat no ttaaniv to holluYO tlitt any 
other noraon Baaoelatlon, or rornoratlon has any tntereal direct or Inillrcct In tlio aald Block, bonus, 
or oUior (ocatlllos than os so atatod by him 

5 That Iho iTcraBo niimluir ot copies of eaeli lasua ot this ptibllcatlon sold or lUstilhulod, 
throliBh Iho malls or otliorielao, lo paid subBcrlhora duilng tho twelve moulha proeodlns tho date shown 
aboio Is (Not a dally pnhllcatlnn.) (Tills Information la louulred from dally piibtlcntlons only) 

WAMEIl A SYMONS, EuatnoaB Slanacer 
.Riiorn to and auhacrlbod bcloro mo this dial day of Octohor, 1042 

DOItOTlIA MOEHIiB, Notary Public 


[iREAIil (My commission csplres ,l(ino 12, 1940) 


PRINTED IN THE UNITED STATES OF AMERICA 



TWO ANNOUNCEMENTS 


Although Educational and Psychological Measurement is 
several months short of celebrating its second anniversary, the 
growth it has made in its brief life is notable. Now it is pos¬ 
sible to announce that another step forward has been taken. 
With this issue, Educational and Psychological Measurement 
for the first time goes to the members of the American College 
Personnel Association as their official journal. That this ar¬ 
rangement will result in a strengthening and broadening of 
the journal goes without saying. 

Educational and Psychological Measurement will continue 
to serve the whole field of measurement as applied in educa¬ 
tion, industry, and government, and the pages of the journal 
will continue to be open to contributions from the entire field 
In the past a number of outstanding articles have been con¬ 
tributed by members of the American College Personnel Asso¬ 
ciation, although there was no tangible relation between the 
Association and the journal. It Is a source of satisfaction that 
beginning with the January, 1943, issue the Association will 
be represented regularly by contributions from its membership 
in accordance with the new arrangement. A section on news of 
the Association will also appear in future issues. 

An announcement to the members of the American College 
Personnel Association from its president follows. 

G. Frederic Kudee, Editor 

To the Members of the American College Personnel 
Association: 

It is with a great deal of assurance regarding the future of 

331 



our Association that I announce that an almost unanimous 
ballot approving the Executive Cmincirs recommendation re¬ 
garding our affiliations with this magazine has been received 
The tremendems pressures that have been building up on our 
many members with regard to war work have caused your 
Executive Council in spend considerable time thinking of ways 
of fortifying our Association dining the war period to the 
end that our corporate existence Avould continue. With the 
unusually fine co-operation of the editorial board of this maga¬ 
zine and of Science Research Associates we can be assured 
of more frequent contact with each other. 

As president of your Association I want publicly to 
acknowledge the splendid w'ork done by our Secietaiy, Dr. 
Fedcr, who conceived and initiated the plan for affiliation 
with this magazine as our official publication. Now Dr. Feder 
can go to war with the satisfaction of a job well done. 

Unfortunately, the ballot voting was not completed in 
time for materials to be prepared for this issue. In future 
Issues, however, articles and news notes will be presented. 
Grace E. Munson, Director of Personnel Research, North¬ 
western University, Evanston, Illinois, has agreed to act as 
editor for the A. C. P. A. section of the journal. All of us 
owe Dr, Manson a debt of gratitude for undertaking this 
service to our organization. I hope that each of you will feel 
responsible for making suggestions to this editor with regard 
to desirable materials to be included. News notes should also 
be sent to her. Please feel that your Executive Council wishes 
to do everything possible to further the work of our iiidi- 
vldual members and the continued strength of our Association. 



Your voting was almost unanimous in favor of a restricted 
meeting in St. Louis in February. It appears that other per¬ 
sonnel associations will follow our lead. We hope that this 


332 



will mean that our program, though restricted, may be a sig¬ 
nificant one. The Proceedings in abbreviated form most likely 
will appear in this journal. 

As each of you finds yourself applying your personnel skills 
to the war work, may I urge you to keep clearly in mind that 
our contributions to higher education are needed more than 
ever these days for two reasons. First, all that college life 
adds to the maturing of students is as necessary in wartime 
as in peacetime. Our contribution through counseling of col¬ 
lege students is being more clearly seen as a significant part of 
higher education. Second, we need to strengthen our tech¬ 
niques and our Association in preparation for the after-war 
period when tremendously increased enrollments may present 
the colleges with rich opportunities for affecting in a greatly 
increased manner the welfare of war students. 

Your Executive Council extends to each of you a most 
cordial congratulation on your significant contributions to the 
war and wishes each of you continued effectiveness. 

Cordially yours, 

E G. Williamson, President 
American College Personnel Association 


333 




APTITUDE TESTS FOR ARMY WEATHER 
OBSERVER STUDENTS' 

EARLE CLEVELAND and CAPTAIN RICHARD W. FAUBION 
Army Air Forces, Technical Training Command 

and 

THOMAS W. HARRELL 
University of Illinois 

T WO groups of weather observer students in the Array 
Air Forces Technical Schools have been studied to find 
those tests which would be most predictive of success in the 
course. 

The first group of students, numbering 116, entered the 
course in August 1940; the second, numbering 73, entered in 
November 1940, These students, like others described in 
previous studies^ of selection procedures in the Army Air 
Forces Technical Schools, were selected on the basis of their 
being high-school graduates, with a score on a revised form of 
Alpha equivalent to a percentile rank of 75 and with a mini¬ 
mally acceptable score on a shop mathematics test. 

The criteria with which the prediction test scores were 
compared are two grades in the weather observer course: the 
first based on a meteorology examination given about three 
weeks after the beginning of the course and the second being 
the final average for the three-month course. The grade on 
the meteorology examination correlated .70 with the final 
course average. The weather observer course covered the fol¬ 
lowing topics: 

1. Wind-aloft charts 
2. Atmospheric soundings 

lA paper read at the 19+1 meeting of the Midwestern Psychological Asso¬ 
ciation. 

^W. Harrell and R Faubion, "Selection Testa for Aviation Mechanics," 
Journal of Consulting Psychology, IV (1940), 104-105. 

_W, Harrell and R. Faubion. "Primary Mental Abilities and Aviation 
Maintenance Courses” ’Educational and Psychological Measurement, I (1941), 
59-66 


335 



I'DI’CATION/M. AND PSyrilOLOGICAL MEASUREMENT 

3. Upper air observations 

4. Plotting map signals 

5. Surface observations 

6. Weather forms 

7. Weather instruments 

After the content of the com sc had been studied, a tenta¬ 
tive series of tests was chosen. These tests were: 

(1) Mental AkUness. An adaptation of the Henmon-Nelson Test 
for higli-scliool students in which some of the items have been 
changed, 

(2) Scattered X's. A mcasme of peiceptual speed in which the 
pioblem is to cross out the x’s placed at landom on a page of pied 
type. 

(3) Identical Nuinhets. A nieasuic of peiceptual speed in which 
the problem is to select which numheis in a column aie identical 
with the numbei at the top of the column. 

(4) Algebra. A standard lest of algehia. 

(5) Meteorologkal Achievement. 50 true-false items based upon 
mateiial used in the Weather Obscivei couise (This test is not to 
be confused with the mctcorologj e.xaiuination, which is one of the 
crlteiia.) 

(6) Physics Aclmvemenl. 1-14 tiue-falsc items based upon mate¬ 
rial used in the Weather Obscivei couisc. 

(7) Surface Development, Six pioblems, each with six parts, in 
which a picture and a diagiam of a simple object are shown, the 
problem being to match coiresponding paits of the pictuie and the 
diagram. 

(8) Flags. 48 items in which the problem consists of deciding 
whether pairs of pictures of Hags repiescnt the same or opposite 
faces of the flags. 

(9) Mechanical Movements. 22 pioblems based on pictures of 
various mechanical movements as, for instance, a question about the 
direction in which oil will be forced, based upon a picture of the 
gears of a rotary oil pump. 

(10) Cubes. 32 problems in each of wliicli the task is to distinguish 
whether or not two drawings represent the same cube tuined to 
different positions, 

Tests 2, 3, 7, 8, 9, and 10 ai-e taken from Dr. L. L. Thurs- 
tone’s Primary Mental Abilities study and were used with his 
permission. Test 4 was also used with Dr. Thurstone’s per¬ 
mission. 


336 



aptitude tests for army weather observer students 


For the students entering in August, the means and the 
standard deviations of each of the ten prediction tests and of 
the two criteria as well as the zero-oider correlations between 
the predictor variables and the criteria are shown in Table 1. 

TABLE 1 

means, standard deviations, and correlations for test scores 

AND GRADES FOR 116 W. O. STUDENTS 


Vaiiables 

Mean 

Coiielation 

with 

Standaid Meteorology 
Deviation Exam 

Correlation 

■with 

Final Course 
Aveiage 

Meteorology Examination 

68.5 

16 8 

— 

70 

Final Couise Average 

80 6 

49 

.70 

— 

Mental Aleitness 

70.0 

10 8 

.39 

.39 

Scattered X’s 

26 2 

9.2 

.11 

11 

Identical Numbeib 

50 6 

6 3 

.28 

32 

Algebra 

3 1 

2.3 

47 

.41 

Meteoiological Achievement 

108 

7.3 

.40 

.do 

Physics Achievement 

33.9 

240 

55 

45 

Surface Development 

21.4 

6.8 

.18 

.11 

Flags 

24.5 

10 7 

.16 

.10 

Meclianical Movements 

22.7 

12,0 

.41 

.32 

Cubes 

15 8 

68 

17 

.17 


It will be noted that, in spite of the previous selection of the 
subjects by means of a test of mental ability, the mental alert¬ 
ness test correlated .39 with each of the two criteria. The 
meteorological achievement test, devised by the Classification 
Division, AA.F.T.T.C., to measure meteorological concepts, 
correlated .40 with the grade on the meteorology examination. 
The physics achievement test and the algebra test correlated 
.55 and .47, respectively, with the same ciiterion. Other cor¬ 
relations between test scores and the grade in meteorology 
were positive but not so high. The Surface Development Test, 
which has consistently correlated significantly with grades in 
the basic mechanical course at Air Corps Technical Schools, 
correlated positively but insignificantly with the Weather 
Observer course grades, which is not inconsistent with what 
one would expect. 


337 




EDUCATIONS, AND PSYCTIOLOGICAE MEASUREMENT 


A multiple correlation between the grade on the meteor- 
ology c.\aminatloii and the best combination of tbe tests re¬ 
sulted in a correlation coefficient of .63. The tests included 
were mental alertness, meteorological achievement, physics 
achievement, and algebra. A combination of the mental alert¬ 
ness, the meteorological achievement, and the physics achieve- 
ment tests yielded a multiple correlation coefficient of .62. The 
difference between the two coefficients was not enough to war¬ 
rant the additional testing time necessary for the algebra test. 
The regression equation was 

X„-.29 -h .38 Xo + 30.34, 

where X„ the mosi probable meteorology examination 
grade, 

X^ =■= the meteorology aptitude test score (meteor¬ 
ological achievement plus physics achieve¬ 
ment) , 

X,. — the mental alertness test score. 

This regression equation, based on the August class, was 
used to predict the results for the class entering in November. 
Of the 73 November students, 10 were eliminated before the 
completion of the course; of these 10, 8 fell below a critical 
level of 55, calculated from the regression equation. Of the 
63 students who completed the course, nine passed who were 
predicted to fail. 

Conclusions: 

(1) Evidence is given from a cross-validation study that 
an examination made up of mental alertness, meteorology, and 
physics questions significantly improves the selection of 
weather observer students in the Army Air Forces Technical 
Schools. 

(2) One of the tests. Surface Development, which has 
been shown to be predictive of basic airplane mechanics grades, 
does not correlate significantly with weather observer grades. 
This suggests the specificity of requirements for the various 
training courses within the Army Air Forces Technical Schools 
and indicates that a single selection procedure is inefficient. 


338 



THE OPTIMUM USE OF TEST DATA^ 


MAURICE LORE 
and 

RALPH K. MEISTER 

T he procedures conventionally adopted in administering 
and scoring age scales of the Binet type are often waste¬ 
ful of time and test materials. For many practical situations, a 
more economical procedure is much in need It is the purpose 
of this paper to describe a briefer method of administering 
and scoring age scales, to indicate the several advantages of 
the newer method, and to present comparisons of the results 
of this newer method applied to the Revised Stanford Binet 
with results from the conventional form of the scale and from 
the abbreviated scales. 

The rationale for the method to be described is directly 
derived from the fundamental relationships between the field 
of mental test theory and the methods of psychophysics, m 
particular the constant method of psychophysics, Let us first 
briefly review the procedure in determining a sensory threshold 
such as the two-point tactual limen by the constant method. 
An appropriate range of stimuli that are judged neither “two" 
nor “one" 100 per cent of the time is selected. Each stimulus 
is then administered to the subject by means of the aesthesi- 
ometer a large number of times in a prearranged order. The 
subject judges the presence or absence of the desired experi¬ 
ence, which is “two.” The responses are then classified and the 
relative frequencies of judgments of “two" and “one" for each 
grade of the stimulus scale are determined, The limen, which 

^The authors wish to express their thanks to Dr, Martin L, Reymert, 
Director of the Mooseheart Laboratory for Child Research, for his generous 
permission to use data from its files. ' 


339 



i.nrr.vTiox.M. and I’SYcirrji.DcicAL mf.asueement 

tn;\y lie cninputed hy the constaiK process, represents a transi¬ 
tion zone bet ween stimuli too weak to arouse a response of 
"two” and stimuli stronu; enoufrh to elicit a response of “two ” 
Conventionally the stimulus value that elicits a response of 
“two” 50 per cent of the time is regarded as the stimulus 
limen. 

Now let us consider in the mental age scale the groups of 
items, supposedly eciual in difticully, that arc allocated to each 
year level as representing the typical performance of indi¬ 
viduals of the corresponding chronological age. These items 
are such that the response is classed as either “correct" oi 
“incorrect.” I'iach item may he regarded as having a charac¬ 
teristic response-value that dilleienllates “incorrect" from 
‘‘correct" responses, and thus each item requires for a “cor- 
I’CCt" response a given degree of ability as expressed in terms 
of a certain age group I'his response-value corresponds to 
the stimulus value of psychophysical discrimination. Thcoietl- 
cally if an individual were presented with items ordered as to 
difficulty and if his responses were made without measuring 
error, correct responses would he made up to a certain point 
on the scale depending upon the individual’s ability. “Incor¬ 
rect” responses would he made to all items beyond this point, 
and the scale t'aluc of this point—which corresponds to the 
psychophysical limen—would represent a measure of the in- 
dlvkluars ability or intelligence. 

Actually of course, as with sensory thresholds, no such 
point exists. In actual practice, instead of this sharp 
theoretical division we obtain mixed successes and failures over 
a number of year levels. It has been pointed out (1) that 
such Irregularity of performance or scatter is in part a con¬ 
sequence of the lack of perfect correlation between items 
resulting from a lack of homogeneity and from the presence of 
error. In the mental test situation, therefore, it is seen that 
the response process may be regarded as a composite consisting 
of a chaiacteristic or “true” component of ability and an error 
component. When the ability component of the individual plus 
the chance erior component of his response is greater than the 


340 



THE OPTIMUM USE OF TEST DATA 


level of ability required to pass the item, he answers correctly; 
when this composite is less, he answers incorrectly. The dis¬ 
crepancy between the actual level of the response and the 
assumed true value, of course, constitutes the error. 

It is evident from these considerations that the psycho¬ 
physical method of constant stimuli for determining stimulus 
thresholds is applicable to such mental test data. Thus, when 
items are arranged in order of dilEculty for standard age 
groups, and the response of any individual to any item can 
take only two values such as “coriect” or “Incorrect,” the 
frequency distribution of responses as a function of item 
difficulty may be assumed to be the integral of the normal 
probability curve. The characteristic response-value or test 
limen of that individual (his mental age score equivalent) will 
be that difficulty value expressed in terms of age that yields 
“correct” responses fifty per cent of the time. The individual’s 
variability or error will be the standard deviation of the prob¬ 
ability function described (2). 

Thus, by simply computing a test limen for an individual 
in terms of the age level at which he passes SO per cent of the 
items, we have an alternative method of determining mental 
age. This procedure is much shorter than that required for 
the full scales and even shorter than that required for the 
abbreviated scales. The test limen or mental age is determined 
either by (a) the single age level at which the individual passes 
50 per cent of the items, or by (b) simply interpolating for 
the 50 per cent point which falls between the age level at 
which he has passed more than 50 per cent and the next higher 
level where he has passed less. Linear interpolation is justi¬ 
fiable here since in the range concerned the curve of per cent 
passing is practically linear, all of the data will ordinarily be 
employed, and a measure of individual scatter is not desired. 
The limen or mental age score may be computed by linear 
interpolation by the following formula: 


M.A. — a,„ -|- 


(a, a,„) (.50 Pi) 

(P.. —Pi) 


341 



KDrCATIONAL AND PSYCHOLOGICAL MEASUREMENT 

where a,„ - the uge at which more than SO per cent of the 
items were passed. 

a, the age at Avhich less tlian S 0 per cent of the items 
were passed. 

p„ the per cent of conect responses at a„. 

ji, the per cent of conect responses at a,. 

The procedure of the examinatinn itself is as follows: 
Begin testing at the age level where the child is likely to pass 
half and fail half of the items. On the average this age level 
will be within six months of the child’s chronological age. 
The examiner should, of course, also take into account the 
grade placement, general behavior, and any additional facts 
available concerning the child's ability. If the chdd responds 
correctly to 50 per cent of the items at the age level where 
testing is begun, his test limen or mental age score is exactly 
that year level, and the test is completed. Should the child 
respond correctly to more than SO per cent of the items, the 
items at the next higher (older) level are administered. Test¬ 
ing is continued until 50 per cent of the items or less are 
passed, and in most cases only one additional level is required. 
In only a few cases is more than one additional level of testing 
required. This is what might be expected from an assumption 
of a normal distribution of intelligence in the general popula¬ 
tion. 

Should the subject respond correctly to less than SO per 
cent of the items at the level where testing is begun, the items 
at the next lower (younger) level are administered and the 
examination is continued until a point is reached where the 
subject passes either SO per cent or more than SO per cent of 
the items. It should be noted that one level determines the 
test limen or at most two. If the subject has been tested 
through three or four levels before a limen determination is 
possible, as may sometimes happen, the only data that are to 
be used in determining his mental age score are the level at 
which he passes SO per cent of the items or the two adjacent 
levels at which he passes more and less than SO per cent of the 
items, respectively. The rest of the test data is ignored. 


342 



THE OPTIMUM USE OF TEST DATA 


As Terman has stated, mental ages beyond fifteen are 
artificial and are to be regarded as simply numerical scores. It 
was decided by the present authors to express test limens 
beyond the age of fourteen in terms of these artificial mental 
ages instead of chronological age. The individual’s test limen 
is, therefore, simply the mental age level at which 50 per cent 
of the items are passed. The problem arises as to what mental 
ages should be assigned to the Average Adult and Superior 
Adult I, II, and III levels. At lower age levels, such as four¬ 
teen, a child passing half of the items at that level is credited 
with a mental age of fourteen by the liminal method Similarly, 
individuals passing half of the test items at the upper levels 
should be assigned the following mental age scores: 

Average Adult 15 years, 4 months 

Superior Adult I 17 years, 4 months 
Superior Adult II 19 years, 10 months 
Superior Adult III 22 years, 10 months 


MENTAL AGE INTERPOLATION TABLES GIVING THE NUMBER OF MONTHS 
CREDIT TO BE ADDED TO THE LOWER AGE LEVEL TO DETERMINE 


1 . 


Items 
passed 
at lower 
age level 


3 


Items 
passed 
at age 
XIV 


A GIVEN mental AGE FOR A CHILD 


For half-year levels 
fiom 11 thro V 



0 

1 

2 

4 

2 

2 

3 

5 

2 

3 

4 

& 

3 

-I 

5 

Items passed at 
Average Adult level 

0 

1 

2 

3 

4 

5 

6 

9 

6 

8 

9 

12 

8 

9 

11 

13 


2 


Items 
passed 
at lower 
age level 


4 


Items 

passed 

at' 

Aver. 

Adult 


For levels from 
V thru XIV 



0 

1 

2 

4 

3 

4 

6 

5 

S 

6 

3 

6 

6 

7 

9 


Items passed at 
Superior Adult I 

0 1 2 

5 

5 

e 

10 

6 

8 

10 

14 

7 

10 

13 

17 

8 

12 

14 

18 


343 







Eni:rATK)NAT, AND PSYCHOLOGICAL MEASUREMENT 


Ileni’i 

passed 

at 

S. A, I 


Items passed at 
Superior Adult II 
0 1 2 


8 

10 

15 

12 

15 ■ 

211 

15 

18 

21 


6 . 


Items 

passed 

at 


S. A. II 


Items passed at 
Superior Adult III 
0 1 2 


9 

12 

18 

12 

18 

2+ 

18 

22 

27 


Tables have been prepared to facilitate the process of 
interpolating for the test limen when it falls between the lower 
(younger) age level at which the child has passed more than 
half of the items, and the next higher (older) age level wheie 
he has passed less The number of items passed at the lower 
age level is given at the left and the number at the higher age 
level is given at the top of each table. The body of each table 
gives the number of months of credit (to the nearest whole 
number) to be added to the age corresponding to the lower 
age level. For instance, if a child passes five items at age seven 
and two items at age eight, we enter Table 2 at the left and the 
top, to find that the second row and the third column intersect 
at the value 8. Thus to the lower age level of seven is added 
eight months to yield a mental age of seven years and eight 
months. Table 1 is to be used for determining limens that fall 
within the age range. If through V, where each half-year is 
regarded as a separate age level. A table for interpolating 
between IV-6 and V is unnecessary since its values are the same 
as those in Table 1. Table 2 is to be used for finding limens 
within the age range, V through XIV. In each instance the 
number of items passed below the limen, i.e., at the lower age 
level, is found at the left of the table. Tables 3, 4, 5, and 6 
are self-explanatory. There will, of course, be a few instances 
in which the llminal method and the process of interpolation 
are Impossible, as for example, when four items are passed at 
Superior Adult III level. 

In order to compare the liminally determined mental ages 
with those conventionally computed on the full and abbreviated 


344 





THE OPTIMUM USE OP TEST DATA 


scale, one hundred Revised Stanford Binet test folders were 
chosen at random from the younger age group in the current 
files of the Mooseheart Laboratory for Child Research. Tests 
chosen were restricted to those of children from approxi¬ 
mately seven to eleven in order to avoid liraen determinations 
at the adult levels where the changed scoring rationale would 
obscure the basic comparison desired. Successes and failures 
at each age level in the administration of the full scale were 
recorded on cards together with the C.A., M.A., and I.Q, 
for that test. Then, by a consideration of only those items 
which are part of the abbreviated scales, each test was re¬ 
scored and a M.A. and an I.Q. calculated for an assumed 
abbreviated scale administration of the same test. Then these 
same tests were rescored a second time and assigned a mental 
age score determined by the test limen as described above, and 
a corresponding I.Q. 

Assuming the full scale as the standard (and disregarding 
the form, L or M), the abbreviated scale M.A. scores and the 
test limen mental age scores were each correlated with the 
M.A.’s of the full scale The correlation coefficients were .98 
for the abbreviated scales and 91 for the test limen method. 
Then the I.Q.’s coiresponding to these ages were correlated 
with the I.Q.’s on the full scale. The coefficient of correlation 
between I.Q.’s on the full and abbreviated scales was .97 and 
that between I Q.’s on the full and the test limen scales was 
.83. The correspondence of scores may be further judged 
from the fact that the mean absolute discrepancy between the 
ratings on the long form and on the limlnal form was a little 
less than 7 points. In 75% of the cases the discrepancy was 
less than 10 points; in 91% less than 15 points; and in 97% 
less than twenty. 

The savings in time of testing may be appreciated from 
the fact that for these one hundred cases the average number 
of levels administered In the full scale was 6.6 while the 
average for the test limen determination was only 2.2 or 
roughly a third. Even when the abbreviated scales are used 
and only four out of six tests at each level are given, the 

345 



ICrn'C’ATIONAL AND PSVCimLOCaCAL MEASUREMENT 

number of tests is cut only one third. I'hus, the average num¬ 
ber of tests administered in the abbreviated scales for this 
sample is still twice that required for the test limen determina¬ 
tion. 

In order to determine how this cutting of the length of the 
test ahected its reliability, out of the one hundred tests origi¬ 
nally selected 31 pairs were chosen which represented succes¬ 
sive administrations to the same individual child. In this way 
data for reliability calculations upon a test-ietcst basis were 
secured. It should be noted, however, that between test and 
retest there was an Interval of elapsed time of about one year 
and that these reliability coeflicicnts might be expected to be 
lower than those of the usual test and retest immediately fol¬ 
lowing because of the changes in the individual and the condi¬ 
tions of testing and even the change in the content of the test 
itself since items of the same degree of difficulty were not 
administered the second time. The reliability of the I.Q, score 
for the long form of the test for the 31 cases was .86, for the 
abbreviated scale .75, and for the linien form .61. This diop 
in reliability with the use of fewer and fewer items for the 
determination of the score is roughly in accordance with the 
expectations derived from the Spearman-Brown prophecy 
formula. 


The equivalence of the mental age scores secured by the 
two methods of scoring may also be judged upon the basis of 
the statistics presented below. 




I.Q 


M. A. 



Standard 


Standard 

Form 

I. Q. Mean 

Deviation 

M, A. Mean 

D eviation 

Full 

108.1 

16.1 

113 53 

22.5 

Abbreviated 

106.4 

15.7 

114.92 

22.6 

Llminal 

108.9 

14.0 

114.37 

19.6 



TheT.O^ and the^M.A. means are practically the same for the 
three sets of scores. The variability of the I.Q’s and the 
M.A.’s naturally decreases as we pass from the full form to 


346 



THR OPTIMUM USE OF TEST DATA 


abbreviated and Ilminal forms because of the reduction in the 
length of the test. 

In the evaluation, on the basis of these data, of the limen 
method of scoring as opposed to the simple addition of number 
right, it should be pointed out that this estimate of its effective¬ 
ness is specific to the Revised Stanford Binet and is affected by 
the extent to -which the Revised Stanford Binet satisfies or 
does not satisfy the conditions for a true difficulty scale. 
Though the test limen method of scoring assumes a series of 
items ordered in difficulty, this condition is only approximately 
met in the Revised Stanford Binet in the sense that though 
items at age ten are invariably more difficult than items at age 
five, yet between adjacent age levels there are inversions, as a 
number of empirical studies have shown. Such a condition 
might have been expected from the limitations imposed upon 
the placement of items. In general they had to be placed 
according to difficulty. In addition, the demands of variety, 
interest, etc., had to be satisfied for each age level. It is quite 
probable that if an age scale homogeneous as to content and 
rigidly ordered as to difficulty were obtainable, the liminal 
method would provide the most reliable measure of an indi¬ 
vidual’s performance on a limited number of items This 
would be the case since the individual is scored upon the basis 
of his performance on items that are of 50 per cent difficulty 
for him. 

It will be a matter of concern to some examiners that there 
is no spread of performance to be analyzed. Or perhaps they 
will feel the need of some measure of individual variability. 
However, as it has been pointed out (1), the practice of 
inspectional analysis of individual successes and failures to 
secure a crude estimate of the individual’s “primary” abilities 
is at best questionable. Such scattering of passes and failures 
is based for the most part on factors inheient in the test, in 
test construction, and in systematic errors. Furthermore, 
measures of individual variability on the Revised Stanford 
Binet possess no unique significance for the individual (3). 

347 



KlirfAridNAI. AM) PSYflini.OOICAI, MEASUREMENT 


'I'he abbreviateil incthoil of constant stimuli thus enables 
an cxauiiricr to secure mental scores reasonably equivalent 
to those obtained by the conventional method of scoring of 
the full scale, in half the time ordinarily required. Of course, 
when administration time is ample, the full Stanford Binet 
should he given. However, occasions arise in the clinic and in 
the field when time is at a premium. At such times use of the 
shorter method enuhlcs the examiner to administer a test that 
would not otherwise be possible at all. 

lo-.ri ri;nci .s 

1. Lon, iMaiincc ami Itlci'-tei, R.ilpli K. “d'lie Concept of Scatter in 

the Li};lit of Aleiital 'IVsf 'riieon," Ediicaiiniuil and Psycholoffical 
Measiiii’mt'tit, 1 .ID.I-.IIO. 

2. Mcisier, Charles 1. “P\U'h(<plij sics and Ment.il'I'est Theorbo Fund¬ 
amental Postulates and Kleiiicntary Theoreni.s," Psychalagical Re- 

vinv, XLVH (1940), .155-366. 

3. McNcmai, Quin. 'I'hr Rwhitm nf the Rtcmjuul Bind Scale (With 
an intrnductorv cluiptet In Ij M,'Peonan). New York: Houghton 
Mifflin, 1942,'ISS. 


348 



A TECHNIQUE FOR TESTING UNDERSTANDING 
OF THE VISUAL ARTS 


MELVIN W. BARNES 
Univeisity of Illinois 


I N 1941-1942, the University of Illinois offered a survey 
course in Literature and Fine Arts. This course, open to 
sophomoies, is one of seven full-year courses which now com¬ 
prise the lower-college program of the General Division of 
the College of Liberal Arts and Sciences. One unit of the 
course in Literature and Fine Arts is devoted to the study of 
painting In an attempt to appraise achievement in this phase 
of the year’s work a testing technique was evolved which it is 
the purpose of this note to describe. 

In the conduct of the course the students were brought into 
contact with a wide variety of paintings by means of lantern 
slides These works of art were studied in terms of a four¬ 
fold scheme of analysis: color, composition, expression, and 
function. In some instances, a work was studied first in black 
and white for the purpose of emphasizing composition before 
it was considered in color. On a number of occasions two or 
more representations of a single theme or incident by different 
artists were studied comparatively. Since the basic aim of the 
course was to cultivate understanding and thereby—it was 
hoped—appreciation, little time was spent on artists, history, 
or techniques of painting In addition to the reproductions 
used in the classroom, materials owned by the Department of 
Art and others in the University museums were made available 
to the students. The course was rounded out with a day at the 
Chicago Art Institute. 

When the problem of testing achievement in this course 
arose, the following procedure was devised. So far as the 
writer knows, the device is unique. 


349 



r.m'C’ATIONAI. AND l’SYCn0I,0(5ICAL MEASUKEMENT 

The technique employed two projecting lanterns to throw 
simultaneously two paintings in color on adjacent screens 
placed in the front of the classroom. By this method colored 
reproductions of a size approximately three feet by five were 
placed side by side In a way which permitted every member 
of the class to sec them clearly. The paintings had not been 
seen by the class before the time of the test. The test, which 
was mimeographed, was based upon points of similarity and 
contrast between the paintings thus reproduced. Before the 
showing of the paintings each student was given a copy of the 
test and time was allowed for reading the directions. By the 
adjustment of Venetian blinds the room was darkened enough 
to permit good vision of the projected pictures, while enough 
light was admitted for reading and writing. Of those taking 
the test the only writing required was the indication of re¬ 
sponses by a letter written in a blank space. 

The pair of paintings was selected chiefly for their 
numerous points of contrast. One painting was a Rubens, the 
other a modern work by William Cropper. The test items 
were organized in accord with the scheme of analysis and 
synthesis which had been followed in class. The first set of 
items dealt with color, the second set with composition, and 
so on, A variety of the conventional multiple-choice items was 
used. The following is an example: 

In Painting A (the Rubens was designated A) more use is 

! a. linear contrasts j 

b. straight lines ( Painting B. 

c. sharp angles i 

d. rhythmic curves ) 

The test is in process of analysis, the results of which will be 
used in revising and lengthening it. 

This technique obviously does not require any particular 
type of test item but is adaptable to all of the conventionally 
used forms Involving comparison and contrast. Reproductions 
of works of sculpture and architecture could, of course, be 
utilized as well as those of painting. Since this method affords 

350 



technique for testing understanding of visual arts 

a means of providing colored reproductions which are pre¬ 
cisely relevant to the aims, content, and method of a course, 
it appears to have possibilities for classroom use which the 
standard tests on the market do not possess. This type of test, 
moreover, is decidedly inexpensive, whereas the cost of the 
better standard tests in art places serious restrictions upon 
their use. 


351 




SOMK OF THF; IJ-SS iMFASURABLl': OUTCOMES 
OF F’.DIJCATION-*^ 


I'.invix J. HKOWN 

K, HIS,IS Stale 'IViuhcMs C'nllcKC 

I NEKU not s;iy that I appear before this group with con¬ 
siderable apprehension and not a little hesitation. Fiankly, 
it is not the group which is causing my trepidation, but my 
subject. 

When one sj)eaks ol (luicowcs in education he is talking 
about the very essence of it all. He is talking about our end- 
product, the thing for which we throw in the current, gear up 
the machinery, put in the man-hours, spend the money; the 
thing which wc get after we’ve done the Avork. Outcomes m 
education arc for us what the linished interceptor, the eight- 
gun pursuit plane, the one-ton bomb, the 155-mlllimeter field 
piece, the thoroughly trained airman is to our war program. 
Thus my hesitiitiuu in discussing outcomes in education at all, 
cA>’en those winch we are ngieed are more or less measurable. 
To discuss the more measurable outcomes before this group 
would take some boldness and one should attempt it with 
much hesitation, but to discuss the less measurable outcomes is 
about tAVice as risky. My only comfort is that no one knows 
much moie about It than does his neighbor. And I am not 
supposed to tell hoAV to measure them. 

First of all, Avhat are some of the outcomes (may I assume 
there are such) wdiich we Avant to get—outcomes which are 
diflicuU of measurement? 

May I say that after spending some fifteen years rather 
directly in the field of measurement, I am not nearly so certain 

*Paper lead at the muetinK of the National Association of Teachers of 
Educational Mea.suremont, San Francisco, February 24, 1942. 

353 



Knr(’AI'ir)NAL and I’SYC'HUI.0GICAL measurement 


of the cflicleDcy of the work as f once was. Heresy? Right- 
hut you can throw me out of the organization later. In general, 
I’ve about come to the conclusion (there are exceptions, of 
course) that the ease and accuracy with which any educational 
outcome is measured is in direct proportion to its unimpor¬ 
tance. That is, the easy items to measure accurately are the 
ones which make the least difference whether they are meas¬ 
ured or not. I agree that there are notable exceptions to my 
generalization. In general, though, you agree with me, don’t 
you, that the more important things of life tend to be beneath 
the surface, too deep to be picked up readily on the hooks of a 
question, and that measurement is usually involved in question¬ 
ing, either direct or indirect. 

Does this mean we should not try to measure these things? 
I’d say, “Certainly not." Let's work on the thing rather than 
say that it is one of the unmeasurables and that we can't do 
anything about it. 

First of all, I’d like to start with the thought that the 
more difficult of measurement outcomes fall into general 
classes. (There may be three, six, or nine.) Let us call two 
of these difficult-to-measurc groups, for want of a better term 
at this time. Outcomes in Altitudes and Outcomes in Apprecia¬ 
tions. I can measure fairly accurately some outcomes in 
arithmetic skills, in spelling accuracy, in verb usage, but I seem 
to have much difficulty in measuring the same youngsters in 
their attitudes toward arithmetic, toward spelling, toward 
grammar I find myself relying on clues which are not too 
clearly defined in my own mind, when I try to measure their 
attitudes. Can these clues, then, be developed, expanded? Are 
the big things in life, after all, caught rather than taught? I 
sometimes get confused, and when so, am inclined to say yes. 
If we go into attitudes wc can break them down into any 
number of divisions, There are attitudes toward school, 
toward home, toward boys, toward girls, toward law and 
order, and so on, exhaustively. However, these can be grouped 
into two big categories from which a further breakdown might 


354 



SOME or THE LESS MEASURABLE OUTCOMES OF EDUCATION 


come I refer to attitudes and traits which are primarily con¬ 
cerned with personal gi owlh and attitudes and traits primarily 
concerned with our lelationship with others Each general 
item is susceptible to a further breakdown, of course. This I 
shall suggest later. 

Isn’t one of the wrong assumptions we make when we 
speak of the more unmeasurable outcomes of education that 
we are inclined to fail to do what we always do in working 
with the more measurable items, viz., break them down into 
some of their component parts? We take arithmetic computa¬ 
tion and break it down into the four fundamental operations 
of adding, subtracting, multiplying, and dividing. Then we 
take addition and break it down even further, trying to find 
not only the weakness in addition but the cause of that weak¬ 
ness. Might we not, if we tried seriously, break down atti¬ 
tudes concerned primarily with our own personal growth, into 
smaller units, breaking these in turn into still smaller ones 
until we might secure parts small enough to be measured? 
Of course, we would not be sine that an old axiom would not 
be ruined and we’d find after we did our measuring that the 
whole is not equal to the sum of the parts. 

Suppose we take the topic of attitudes in personal ~develop- 
ment. What are some of the things which might be considered 
from the viewpoint of a high-school boy or girl? Of course 
no one can name all of the desirable attitudes which are worth 
considering, but let’s begin: 

1. An attitude of open-mindedness. We might interpret 
this as the Evaluative Criteria for the Cooperative 
Study for Secondary School Standards does. A willing¬ 
ness to revise opinions and conclusions in the light of 
new evidence. 

2. An attitude of critical-mindedness._ Disposition to seek 
causes or explanation, to weigh evidence with care, and 
to withhold judgments until sufficient evidence is in 

3. An attitude of concentration. Ability to give attention 
through a considerable period of time in spite of dif¬ 
ficulties or distractions. 

355 



EiniCATKJNAL ANH PSVC'IIOI (K'.K’AL MEASUREMENT 


4 An attitude of tudiiUtiotisucss. Disposition to use time 
and ability ellectivcly and constructively. 

5. An attitude of i rspoiisihility. Willingness to acknowl¬ 
edge rcsponsdiility for one’s acts and obligations. 

6. An attitude of scl[-ichance. Willingness to make de¬ 
cisions and cairy out plans oneself instead of depending 
on others or the school. 

7. An attitude toward sclj-(()nlrol Ability to avoid dis¬ 
play of temper or other uncontrolled emotion. 

8. An attitude of 11 calivcnc^s. Desire to do or say things 
in a new or better way. 

9. An attitude of (.■(i/Zi/o/rtriJ/ Readiness to enjoy life and 
participate in its wholesome activities. 

These we will all grant have something to do with personal 
development. 

Theie can be little doubt but that some of these items are 
more susceptible to objective mea.surement than are others. 
Again, there is little doubt but that each of these is more sus¬ 
ceptible to measurement than is the geneial outcome used for 
illustrative purposes from which they came, the outcome of 
personal development. 

My suggestion is now that each of the items be considered 
in turn for a further breakdown. This, of course, would 
entail the development of a valid definition, which probably 
would go back to the common consent, massed judgments 
technique. 

In the field of social relationships we have another of the 
more difficult to measure outcomes of education No one, of 
course, argues that the outcome is insignificant because it is 
hard to measure, or that it is found completely embodied in 
other outcomes which are easier to measure. Shall we then 
pass it up entirely? Let's sec what we might do to it, again 
falling back on the Cooperative Study material for sugges¬ 
tions. We suggested that opcn-miiulcdncss, critical-mindedness, 
concentration, industriousness, responsibility, self-reliance, self- 
control, creativeness, and enthusiasm arc desirable, but difficult 
to measure, outcomes of personal development. Now what are 

356 



SOME OF TIII'^ LI',Hb MEASl’RAULE OU FCOMES OF EDUCATION 

some desirable outcumes, brokeir down just once, of social 
relationships? Suppose we say; 

1. Soiial-mindi’dness. Willingness to subordinate personal 
advantage to the conuuon good. 

2. Co-opri fition. Desire to work agrecbly with others. 

3. Tolerance, (iood will toward groups or individuals of 
dilterent race, customs, or opinions. 

4. Conriesy. Consideration of others. 

5. Generosity. Willingness to share opportunities or priv¬ 
ileges. 

6. Honesty. Integrity in handling money, straightfor¬ 
wardness, sincerity in personal relationships. 

7. Dependability. The extent to which one fulfills prom¬ 
ises, discharges obligations, finishes tasks. 

8. Loyalty. Devotion to interests of friends, school, 
home, country. 

9 Fair play. Unwillingness to take advantage of others 
or another, 

Our dilliculty, of course, is to get measurements of atti¬ 
tudes toward, not just of information about. One can com¬ 
paratively easily measure information about, but not the 
attitude toward. 

One could go on and build these up. The point I would 
make is that while these outcomes arc difficult of measurement, 
each breakdown tends to become more objective or perhaps 
better, less subjeeti’ve. If in turn one were, for instance, to 
analyze honesty for a high-school pupil, it might be found that 
a fairly valid test could be set up. I’m inclined to guess that if 
the validity could be assured, reliability would follow fairly 
readily; that is, a test could be made which would agree with 
itself. 

yipprecialions is the generic name we give to another group 
of outcomes which arc deliberately sought, We have, however, 
not gone so far as we might in developing measures, largely, 
I suppose again, because of the feeling of intangibleness 
Undoubtedly, these are even more difficult of measurement, as 
the emotional factor enters in. However, again, it is not out 


357 



EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


of place to say that appreciation of beauty in nature, or better, 
in art, is not so difficult to measure as appreciation tn general. 
Appreciation of umimendahlc conduct and qualities in others 
might be measured—indiftercntly well perhaps—but still 
measured. 

Appreciation of home and family would seem susceptible 
of some measurement, and so on for other items such as ap¬ 
preciation of good workmanship, appreciation of spiritual and 
religious values, appreciation of law and constituted authority, 
and others. 

It seems to me that a prime reason for not doing more, at 
least in the attempt to measure certain educational outcomes, 
lies in our unwillingness to attempt a further breakdown which 
is always a step m measurement. To illustrate; Under out¬ 
comes of social relationships which arc surely end-products of 
education, I listed among othcis, couitcsy and/rt?> play. Let’s 
sec what a further breakdown would do. We defined courtesy 
simply for the sake of mutual agreement as consideration for 
others. Let’s start a fuithcr analysis, considering the subject 
from the viewpoint of a junior in high school, somewhat as 
follows: Do I always wait my turn; do I refrain from loud 
talking and laughing when it disturbs others? Do I refrain 
from interrupting others when they are talking? Do I offer 
to share what I have with others? And so on. Isn’t it possible 
that this rather intangible outcome of education can be meas¬ 
ured, and much more reliably than we are inclined to think? 

Fair play, which is one step down from the general out¬ 
come, social relationship, might in like manner be susceptible 
to analysis. Fair play; unfair advantage; cheerful loser; 
modest winner; recognition, appreciation, and commendation 
of skill in others; consideration of the sensibilities of otheis; 
abiding by decisions without question either by word or act; 
loyalty to personal and team ideals; observation of training 
rules in athletics; winning without boasting, losing without 
whining; willingness to sacrifice for a group good, and so on. 
Each item in turn might be analyzed still further. We make 


358 



SOME 01' Till'. I.KSS Ml,.\SrRAHI,K OrTCDMKS OF EDUCATION 

the mistake (if tryiiifi; td nu-asurc an inipintant item like this 
with a short test. It ttniuires a test a.s well developed as a 
Binet revision. 

Perhaps, in conclusion, it should he said we must not be 
too much pertuibed at first about the nicasurement of these 
outcomes hut should turn our attention (irst to securing these 
outcomes with a gieater degree of certainty. It would be 
desirable indeed to be alile to measure results in a citizenship 
dass—if we are sure that we are teaching citizenship. Patriot¬ 
ism is dear to every .American’s heart, but who knows for dead 
uiie what it is—or how to teach it? 

I would indeed be disconsolate did I not believe we are 
doing better work both in teaching these intangible outcomes 
as well as in measuring them than we think we are. The clues 
we get arc probably fairly good. This is, of course, wishful 
thinking. Someone has said, ” ’Tis better to travel hopefully 
than to an'i\-e,'' and Browning puls it, “A man’s reach must 
exceed his grasp or wlial's a heaven for?” 


359 




the AIMS, OBJKCTIVI-S, AXD OUTCOMES OFTI-IE 
OIIIO 'EI-STIXC; PROUKAM* 


UA\' c;, WOOD 

(Jilin Mntf* DeliatdiiciK nf KiliiiMliiin 

T oday the world is moving alonp; at an increased tempo 
and at a hif^hcr dcf>;rec of cfHciency. Education, too, must 
swing into .step, and it is doing so. Elducation, it is agreed 
today, IS for the whole of life; it is concerned with the de¬ 
velopment of the "ce/m/r futiiirc of every pupil," with the in¬ 
tegration of his total ]H‘rson.iIity for the good of himself and 
of society 'I'hcre is disagrcenient, however, as to just how 
nnich time and attention the school should give to the develop¬ 
ment of the social, moral, and physical phases of the child, 
and how much time and attention to the development of his 
Intellectual side, 

Some progressive extremists would sacrifice the mental 
development of tiie student to the personalizing of his char¬ 
acter, or make it a complementary factor only; traditional 
extremists would reverse the procedure Neither policy is 
wholly applicable to our present educational situation or meets 
the needs of the typical American school. 

What is needed now, more than ever, is that from our 
theorizing and experimentation some tangible and definitely 
constructive guiding principles, practicable in all our public 
schools, be evolved for the development of integrated social- 
intellcctual personality. What these shall be is still a question, 
but I believe with many that we should train the mind to the 
maximum of possibility, taking into account the limitations of 
all methods, salvaging the best that is m them, and inventing 
new ones that are more generally successful. 

*Paper rend at the meeting of the National Aasociation of Teachers of 
Educational Measurementii, San Francisco, February 2% i a. 

361 



I'DUCATIONAI. AND PSYCHOLCJCaCAL MEASUREMENT 

In doing this wc shall arrive at new methods, new objec¬ 
tives, new subjects, and new curricula. But to make possible 
this more comprehensive program, with its broad conception of 
the schools' activities, time (so essential to everything now) 
must be saved. Overlapping, useless, and minor material will 
have to be discarded to make way for the many newer and 
more important elements that are to be added; that is, a better, 
clearer understanding of the taiiijihlcs needed will leave more 
time for the needed intangibles to be developed. 

And it is for this saving of time, it is for the determination 
of achievement in the needed tangibles, that a good testing 
program is requisite 

Wc in the testing field in Ohio are justly proud that our 
work is helping to do this, that it is in line with the modern 
philosophy of education, in fact, that it is giving direction to 
It in a concrete and personal way. By providing the great 
majority of teachers in Ohio a scientific means of analyzing 
and evaluating their product, wc arc saving them time to 
achieve more, and more efficiently. 

Our program, which has been carried on since 1929 as a 
division of the State Department of Education, is in its several 
phases unique to Ohio. Its chief ohjectivc is the motivation of 
scholarship—It stimulates the educational units to put forth 
more effort and seeks to increase the efficiency of that effort. 
There is no compulsion whatsoever about participation in our 
program, and there is no attempt at standardization. These 
are important features, we believe, contributing to its effective¬ 
ness and its popularity. Besides, it is distinctly a product of 
and for the Ohio schools, because the tests are built by Ohio 
teachers for Ohio children. They are designed rot to de¬ 
termine the success or failute of the individual but to help 
teachers to adjust their teaching to the needs of their children 
and the children to adjust themselves to the work of the 
particular class or subject Because of this, students in the 
state no longer approach tests with fear and trembling, in such 
a disturbed state emotionally and mentally that the results are 
impaired, but they anticipate the testing with a spirit of sports- 


362 



aims, OliJlCCTIVLS, (JUTCOMES OF THE OHIO TESTING PROGRAM 


maiiship and with a realization that whatever the results may 
be, they will aid, not hinder, them 

Since the beginning of our testing program, the democratic 
philosophy of education has been the guiding principle upon 
which the work of the f)hio Scholarship Tests division is 
based. 'Fhc tests of the program are revised annually, and 
thus provision is made for meeting the changing curricula, 
textbooks, and methods of educational practice. Furthermore, 
there is no compulsion to use any of the tests administered. 
Schools, private as well as public, are free to use any phases 
of the program for such purposes as they wish. 

Stated concisely, the objectives of the several phases of 
our program are; 

1. To provide materials for the improvement of instruc¬ 
tion. 

2. To provide a continuous program of new and improved 
tests 

3 'I'o provide for the motivation of students toward 
greater accomplishments in their classroom activities. 

4. To provide pertinent instructional research data. 

5. To provide curriculum guides, 

These objectives are achieved variously by the six distinct 
phases of our testing program, which are: The Every Pupil 
Tests, the Eie/litli Year Test, the General Scholarship Test for 
High School Seniors, the District-State Scholarship Team 
Test, the Senior Survey Tests, and the Bulletins of Research 
and the Curriculum Guides. 

An Idea of the popularity of the program may be gained 
fiom these figuies: a total of 1,200,772 tests were adminis¬ 
tered in this program last year. In this number was repre¬ 
sented every county in the state and over a thousand large 
and small city schools. Of this number, 41,269 were eighth- 
graders; 1,146,672 were grade-school and high-school pupils 
who took part in the Every Pupil Tests; 5,305 were high- 
school seniors in the upper third of their classes; and 7,526 
were high-school students In grades 9 to 12, who were selected 

363 



EDUCATIONAL AND TSYC'HOI.OGICAL MEASUREMENT 


to take part in the annual spring academic and commercial 
contests. 

This popularity is due to the fact that the school men of 
the state look upon the testing program as one of the most 
vital and beneficial functions in the state educational set-up. 
They recogni/.e the testing program as one of their own 
supervisory tools, as an active force growing out of and along 
Avith the actual conditions in their schools, not as a measuring 
stick imposed from without. 

A brief discussion of the several phases of this program 
will give a clearer picture of the program. 

The Every Pupil Tests, because they affect the greatest 
number of children, may be considered the most important 
phase of our program. Theie are two series of these tests, 
the Fust Every Pupil Test, administered in December for 
general diagnostic purposes; and the Second Eveiy Ptipil Test, 
administered in March for an achievement measurement and 
as a check upon the effectiveness of the remedial teaching which 
has been carried out on the basis of the results of the First 
Every Pupil Tests. 

In these scries aic included tests for all the subjects that 
are most commonly taught in Grades 3 through 12, For 
example, avc have tests in English for Grades 3 through 12; 
Reading for Grades 2 through 12; Mathematics for Grades 3 
through 10; and in the other subjects such as Latin, French, 
Geography, Social Studies, Chemistry, Physics, General Sci¬ 
ence, Biology, Flealth Education, and ITygiene. New forms 
of each test are developed for each administration, and new 
tests are added from time to time, such as Attitudes and Skills 
in the Use of References, Conscivalion, and Scientific 
Thinking. 

These tests are of the achievement type and are so con¬ 
structed that they give good general diagnoses of both the 
individual and the class abilities and deficiencies in the prepara¬ 
tion in the specific subject. For an analysis of the more 
puzzling and particular individual difficulties, the teacher must 
use specifically diagnostic and functional tests; but, for the 


364 



aims, OliJI'.CTIVKS, OUTCOMES OF THE OHIO TESTING PROGRAM 

general group and the average pupil, these tests have proved 
very satisfactory in the thirteen years they have been 
administered. 

As was mentioned before, these tests have their origin in 
the Ohio classrooms. 'I'hey are constructed by Ohio teachers 
of rccogni'/.cd ability, working in committees or individually; 
they emlnnly suggestions sent in by teachers and administrators 
throughout the stale; and they aie validated against research 
studies, committee repoits, the most used textbooks, and the 
Ohio Cuntfuluiu Gu'tdc'i. In this way we are sure that the 
tests are really measuring whal is or should be taught, and 
that they are of service to the teachers and students in 
emphasizing and vitalizing the important content 

liach .subject-test takes one forty-five minute period and 
may be administered either in the individual classroom or as 
best suits the purposes of the school. All tests are scored by 
the classroom teacher, h'orms arc furnished with each order 
for recording for state use the general distribution of the 
scores of a class atul for making item reports. These reports 
aie, of course, kept in strictest confidence As soon as they 
are receiveil by the state ollice, state percentile and item norms 
arc compiled from the data, 'riiesc arc then printed and mailed 
to the participating schools. 

By analyzing and interpreting the results of the work of 
her own class in the light of these norms, the teacher is able 
to determine wherein there are deJlcicncies and can set about 
to determine the particular causes of the weaknesses Similar 
analyses and interpretations may be made for the individual 
pupil—in fact in not a few instances the students themselves 
analyze ami interpret their own results. Ihus, comparatively 
early in the year both the teacher and the student have an 
estimate of the equipment of the student for the course and 
indications of probable areas of difficulty, and together they 
can set about to discover the causes of these difficulties. When 
the causes are determined, then a definite remedial program 
may be settled upon. 

After the Second Every Pupil Test has been administered 

365 



EDI'CATIONAL AND PSyCHOI/KJICAL iMEAStJRIfMENT 


and the analyses and interpretations have been made, a com¬ 
parison with the results of the FirsI Every Pupil Test reveals 
whether progress has been made and whether the lernedial 
procedures have functioned cftcctively. Through these com¬ 
parisons every pupil has a diagnosis of his achievement and 
has evidence of the phases of the subject he has mastered and 
the phases upon which he must still concentrate This is the 
factor that is emphasi'/.ed continually—the improvement of 
the original product. It has done much to lessen, if not to 
erase completely, the fears of teachers that these tests are a 
means of measuring teacher efficiency. I'eacheis have come to 
realise that high scores or low scores on the tests are not the 
important factors, that the really important factor is the 
improvement of learning as indicated by the raising of the 
scores of the individual pupils and of the class between the 
first and second scries. They know that a low-ranking class 
that shows progress is a better evidence of good teaching than 
a high-ranking class that remains at the same level from test 
to test. 

The Kiphtli Year Test and the General Scholarship Test 
[or High School Seniots arc two of the other important parts 
of the Ohio Scholarship Testing program These tests are 
measures of the cumulative achievement of these respective 
groups and are administered in the spring of each yeai. The 
tests are designed to measme not only factual knowledge as 
such but also the ability to use this knowledge in functional 
situations, and to stimulate the desire for the acquisition of 
such knowledge and ability. The Eighth Year Test is a two- 
hour test in the four fields of English, mathematics, history, 
and science, and may be taken by any eighth-grader. The 
Senior Test consists of rather general tests in the following 
five areas—English, mathematics, science, social science, and 
reading and functional language, with 30 minutes allowed for 
each test, Because one of its chief purposes is the selection of 
outstanding students, only the upper third of the graduating 
seniors are admitted to this examination in Ohio However, it 
is administered to all high-school graduates in the state of 

366 



aims, OliJECTIVKS, tniTCOMIiS OF THE OHIO TESTING PROGRAM 


New Mexico, where it is administered under the direction of 
the University of New Mexico at Albuquerque. Other colleges 
and universities outside Ohio likewise use this test in their 
freshman placement programs. 

High general scholarship is evidenced not merely by an 
acquaintance with the basic principles in the general fields of 
learning but likewise by the ability to apply this knowledge to 
life situations; it calls for a broad as well as a thorough educa¬ 
tional preparation. The objective in these two tests of our 
program is the stimulation of this high general scholarship, 
and it is evident that they are serving as real stimuli in this 
respect, Follow-up studies have proved that the specific in¬ 
centives of these tests have resulted in an increased interest of 
students in their achievement, have encouraged many to 
broaden their educational preparation through more wide and 
general reading, and have led many high-school students to 
choose more widely from the courses offered in their program 
of studies. Hundreds of scholarships are awarded annually 
by many colleges and universities m and outside of Ohio to 
seniors ranking high in this test. Follow-up studies have 
shown that these students do considerably better than average 
work in the institutions at which they matriculate; and research 
has shown that these tests are predictive for the group as a 
whole of probable success in continued education. 

The fourth part of our testing program—the Disirict- 
Statc Academic and Commercial Scholarship Tests are admin¬ 
istered in May of each year at the five state universities (in 
Ohio well located geogi'aphically for this purpose). The 
District-Slatc Academic Test has become the “scholastic 
event” of the year in Ohio; it fosters interest in academic 
achievement in a manner akin to that in which athletic events 
stimulate athletic prowess. The students enter with a great 
deal of enthusiasm and interest, and they work hard in order 
to make the scholarship team and participate in this “academic 
field day.” Last year 7,526 students in a total of 237 teams 
participated. Schools send teams of 32 students, or fewer, to 

367 



EDirCATIONAI. AND PSYCHOLOGICAL MEASUREMENT 


their closest univeisity center to compete with other teams and 
students in that district for academic honors. Each team is 
limited to two entrants in each of sixteen subjects, which include 
English, mathematics, history, the sciences, Latin, and modern 
languages. The schools arc classified according to enrollment, 
except that schools of a county system combine to send one 
team. Points are given lor the first twenty places in each of 
the subjects, and the teams are lanked superior, excellent, or 
honorable mention, according to the number of points they 
accumulate. District awards are made to individuals and to 
teams according to the classifications of the schools Then all 
papers are sent to our state office, where similar awards are 
computed for the all-state winners. 

The purpose behind this part of the program is again a 
motivation of scholarship, especially by the granting of recog¬ 
nition to students of outstanding achievement. It should be 
noted that it is not only the thirty-two pupils on the team who 
receive this motivation, but also the hundicds of others who 
try to make the team. 'Phe tiuly professional and able admin¬ 
istrator makes the most of this opportunity and encourages 
every student to strive to win a place on the scholarship team, 
by not announcing the appointees until just hefoic the meet. 
We have athletic contests, music contests, and foiensic con¬ 
tests; why shouldn’t we have academic contests'? Why 
shouldn’t we populari'/.e the “brains" us well as the “brawn" 
of our schools? Ohio schools have recognized the values of 
this academic competition, and school people would not be 
happy if it were to be discontinued. 

The fifth phase of our program —Senior Siavcy Tests and 
remedial materials—are comprehensive diagnostic tests de¬ 
signed for locating deficiencies in the fundamentals in English 
usage, reading, and mathematics, and are administered the first 
week of cither semester Along with the tests are provided 
manuals and workbooks for the remedial work, which are so 
organized that the instruction may be carried on Individually 
or in groups, with a minimum of direction on the part of the 


368 



aims, OIJJKC'IIVES, Ol-ri'fJMKS OF rilE OHIO TESTING PROGRAM 


teachers, This [Kirt of the program had its origin in the plea 
of college leaders who found many high-school graduates 
entering their institutions poorly equipped in these fundamen¬ 
tals so necessary to the carrying on of successful work. If 
high-school students who enter college arc in need of remedial 
work in these areas, many wfjo do not go on to college prob¬ 
ably also nccil to have their ability in these fundamentals 
improved. Kecogni/.ing the urgency of this need, the Ohio 
State Department of Education is granting one-half unit of 
high-school credit to each senior who shows proficiency in 
overcoming tlie weaknesses clearly indicated by his results on 
Form A of these tests. Many colleges and universities 
thioughout the nation have rccogni/.ed the merit of these 
materials and are using them as a part of theii freshman 
program of testing. 

So nuicli for tlie tests themselves and the valuable services 
they render to teacher and pupil alike, when properly admin¬ 
istered, analy/ed, and interpietetl. Let me suggest briefly some 
of the concoimtaiit ^■al^es that are to be derived from such a 
co-openitivc testing program 

'I'lic llrsl is tlie great number of research studies that the 
results of these tests give lise to and that are of particular 
value to the teachers of the system because the bases lie in the 
local situation. In Ohio many such studies have been com- 
pleteil, and teaching procedures have been influenced to the 
end that the indicated weaknesses arc being remedied. Very 
complete studies have been made in English, mathematics, 
and the social sciences, and less comprehensive ones in other 
subjects. Two research reports, R., and R,i, have recently been 
issued on the Every Pupil Tests —these give superintendents 
and teachers techniques for determining growth learning 
curves on the basis of their own class scores, and of interpret¬ 
ing their significance to the Individual pupil and teacher. 

A second very Important concomitant value is the aroused 
interest of teachers in the improvement of their classroom 
product and in an understanding of the scientific diagnosis and 
measurement of their teaching procedures This has been 

369 



i';nrc'/\'iK)NAL and i>s\'c’ii()i.oc:ical mioasuremknt 

evidenced by the vitalizing of curricula materials and by the 
functionalizing of the learning process. Most of the tests for 
the entire program are built by indivulual classroom teachers 
whose work has been recognized as outstanding or by com¬ 
mittees of teachers in the field of the subject. Teachers and 
administrators alike have expressed amazement at the broader 
understanding of their task and the other worth-while results 
that have come to them from their participation in these test¬ 
building projects. Not only have teachers taken an active part 
in the construction of the tests, but they have also been active 
in the writing of Cninciihm Guides, which suggest materials 
to be taught and abilities to be developed in the various fields, 
recommend methods of procedure, and provide a working 
bibliography. These Guides are not attempts to dictate order 
in the presentation of material or methods of procedure nor 
are they attempts to supplant the local course of study; they 
are designed to assist teachers and local curriculum committees 
in the re-examination, re-evaluation, and re-formation of 
their curricula in the light of present trends in educational 
philosophy, 

A third and most important concomitant value is the stimu¬ 
lation and motivation of the thousands of students who 
annually come in contact with this program. The Every Pupil 
Tests help them to help themselves; they have specific and 
objective evidence of their achievements and of their lack, of 
their abilities and of their deficiencies, and go about their 
remedial classwork with understanding and with determination 
to improve The other tests create an active Interest in not 
just good scholarship, but in excellence of achievement. 

Time does not permit the listing of more of these values 
nor the further elaboration of those already mentioned—each 
in Itself would furnish material for another paper. However, 
these suggestions and this brief survey of our Ohio Scholarship 
Testing Program have, I hope, presented the possibilities of a 
a varied, comprehensive, and co-operative state testing pro¬ 
gram, and demonstrated that such a program Is of real service 
to its participants—teachers and pupils alike. 

370 



EDUCATIONAL REQnUI-.MENTS AND 
OCCUPAtrciNAL LI'VKLS 


KtClIAKll 11. .\u.! \* aiul U,.StlK I'. KRO.’HE 
Dpp.'irimriit of Fohlii' Srhuoli, Pioitulfiiir, Khwlr l<ln(iil 


I N every job description and worker description there 
appears the category "Educational requirements.*’ These 
educational retjuirements for almost any kind of work are 
somewhat ela.stic. In good times when labor is scarce, stand¬ 
ards always move downwartl: while in periods of unemploy¬ 
ment, standards uie automatically raised. This is true with 
the standards of college and professional schools as well as 
with the standards of apprenlieeshii) and of employment in 
the le.ss skilleil classilications. In fact, most employers and 
personnel wtirkers regard ctiucafional requirements as merely 
a convenient and economical screen with which to eliminate the 
less dcsirahle .ipplicatus, 

This situation is interesting in view of changes in educa¬ 
tional practice during recent years. Formerly most school 
adjustments were made in terms of grading; that is, the slower 
pupils were kept hack grade after grade until there was a 
high percentage of over-agcne,ss for the grade throughout the 
school system, at least until the legal age of school-leaving 
began to operate, Under such conditions the “last grade 
attended" really had a definite meaning in terms of school 
achievement and educational qualifications. In recent years, 
however, there havS been a strong tendency toward the practice 
of promoting most children from grade to grade largely on 
the basis of age and attendance. Under such conditions, dif¬ 
ferentiation of instruction has been accomplished by classifica¬ 
tion or grouping within the grade, or by group assignments 
within each class, Consequently the "last grade attended’^ by 
any child may not be a fair indication of his educational quali- 

371 



EnUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


fications. In fact almost the only accurate method of appraisal 
of school achievement is by means of standardized tests, the 
results of which are relatively independent of the school 
system, the school, the curriculum, the teacher, and other 
factors such as the policies of pupil adjustment in the indi¬ 
vidual school or school system. 

A study of achievement among pupils of any grade indi¬ 
cates a distribution ol scores covering a range of from five to 
eight school grades or educational ages. Under these circum¬ 
stances, a diploma or a grade of school leaving means little 
unless it is supplemented by such information as the teachers' 
marks in academic subjects, marks in special subjects, infor¬ 
mation regarding the curriculum, and the classification of the 
pupil, and especially, if possible, marks in standaidlzed tests 
in the basic skills and core subjects. These matters are not 
generally understood by employers and personnel workeis. 
Instead they usually condemn the school product in a lather 
general and wholesale fashion, assuming that schools still aie 
like those with which they were once familiar. An illustration 
in point may be helpful at this stage: 

In a large manufacturing plant the personnel manager, a 
very capable and wise man, called the school placement office 
for a bus boy to work in a lunch room, clearing away the 
dishes and carrying the trays. A strong body, a willing and 
pleasant disposition, and a reasonably agreeable personality 
were the chief requirements. Theic was little or no oppor¬ 
tunity for advancement, and experience had shown that an 
intelligent boy would not remain long at such work The 
counselor selected a big, strong, sixtecn-ycar-old boy from a 
special class for backward children. ITc was nominally a 
seventh-grade pupil, hut actually his achievements in arithmetic 
and reading were about on the third-grade level. However, he 
was the kind of boy that the teachers would ahvays send on 
simple errands or use as a helper in routine tasks. He was 
cheeiful, willing, clean, and from an average home back¬ 
ground. The counselor explained all of this to the personnel 

372 



E]5UCATI(JNA1. KKariRICMKNTS AND OCCUPATIONAL LEVELS 


manager, who accepLxd the hojr and reported after the first 
month that he was doing very well. 

One day, however, the office boy in the mam office left to 
enter the military service; a second boy was ill; and a third 
had been sent cm a long eirand. It was necessary to move a 
number of heavy articles from the office and the bus boy was 
requisitioned for the purpose. In the midst of his work the 
general manager culled him and sent him on an errand. The 
directions would not have been difficult for an average boy, 
but they were extremely difficult for him. He had to ask 
directions and it became evident that he could not read the 
names of officials on the various doors and some of the other 
strange signs around the plant. When this was reported to 
the manager he loimdiy condemned his personnel man for 
hiring such a stupid boy, and the school placement office for 
recommending a boy who could not read, and the school system 
for graduating a pupil under such conditions. This man was 
evidently thinking of the schools of a generation ago, and of 
conditions in employment at a time when bright and capable 
high-school graduates were glad of an opportunity to work 
from the bottom up. I am sure that he did not mean to be 
unfair or unjust, 

If personnel officers and counselors are to be realistic and 
truly helpful to young people, it is extremely important that 
diey should neither over-estimate nor under-estimate their 
achievements and abilities, An accurate appraisal of educa¬ 
tional status and achievement is absolutely necessary in order 
to determine the readiness of any individual to enter a pro¬ 
gram of training at any occupational level. While a rough 
approximation of status and achievement may be obtained 
from the school record, it is at present possible to bring any 
record up to date by means of a battery or inventory of 
achievement and aptitude tests. No single battery will ade¬ 
quately serve the entire range of individual differences to be 
found among adults. Instead at least five different batteries 
would seem necessary to accomplish the purpose. Even a few 
years ago it would have been difficult to have selected adequate 

373 



EDUCATIONAL AND PSYCIIOLOGICAI, MEASUREMENT 


test batteries for this purpose, but at the present time there 
are at least five excellent batteries which are available and m 
common use, and there are at least five different groups or 
levels for which these tests may be appropriately used. 

These levels in both educational and occupational oppor¬ 
tunities are shown in the accompanying chart: 



Intel ft etaliotr, A person's present status may be estimalcd by his last 
grade or by his test record. In any oceupalioimi lenicl, a peison m the hiKhest 
third (A or B) is preferred and can itsrially obtain work even in slack times 
A person in the middle tbiid can usually obtain work In normal times. Peisons 
in the lowest third may meet the minirarim retiuircmeni's, but can improve then 
chances of gettinK and holding a job by entering at a lower level and pro¬ 
gressing through in-service and supipementary tiaining This piocedure sohes 
most problems involving diseriminalion. 


374 


Richard D. Ali.en. 
Lester F. Krone. 



i;nUCATI()NAI. KECiriRUMI-.NlS AND OCCUPATIONAL LEVELS 

The lowest level shows u group of approximately two 
thousand pupils on a sc\enth-grade level in the Piovidence 
junior high schools. The ligurc shows the distribution of these 
pupils in a battery of achievement tests. Both the Metro¬ 
politan and the Co-operative tests have been used and have 
shown approximately the same range and distribution of 
scores. It is interesting to note that more than a thousand 
adults and young people over eighteen years of age, who 
applied for training as production workers in jewelry and 
novelty manufacture, had a median score in reading, general 
mathematics, and academic aptitude equal to the seventh grade 
in the junior high schools. For these jobs there were practi¬ 
cally no lequiremeiits in regard to education, d'he work was 
repetitive and did not require even a mechanical aptitude or 
finger dexterity hcyorul the "D" levei. However, all seven 
hundred graduates of the jewelry class were trained and placed 
after less than one inoiuh’s training at a minimum wage of 
forty cents an hour. 

Idle second ligurc is typical of the distribution of scores 
on the Co-operative tests at the end of the 9A or the beginning 
of the lOB grade. 'The median non-graduate school leaver in 
Providence has attained a tenth-grade status, has a C— I.Q., 
and ranks C in reading comprehension and general mathe¬ 
matics. It is interesting to note that in a group of more than 
two thousand men and women who were candidates for train¬ 
ing in defense classes in Providence, the median grade of 
school leaving was the tenth, and the median scores in academic 
aptitude, reading comprehension, .and general mathematics 
were approximately equal to the tenth-grade median. These 
jobs for the most part, in the opinion of employers, required 
ability to read directions and a reasonable mastery of funda¬ 
mental skill in mathematics. These workers may be classified 
as production workers of the higher level. 

The third figure indicates the results of the Co-operative 
tests on approximately two thousand hlgh-school graduates in 
Providence, showing range and percentile distribution. From 
these pupils are selected those who are admitted to colleges, 

375 



EDUCATIONAL AND PSYCHOLOCICAL MEASUREMENT 

technical schools, appicnticeshvp schools, aucl nurses’ training 
schools, as well as many of the more desirable occupational 
opportunities which require high-school giaduation. 

'riie fourth figuie indicates the range and distribution of 
scores on the college sophomoie Co-operative tests as shown 
in the Co-operative testing program of both local colleges in 
this area and the nation-Avide program. Persons at this grade 
level usually make their decisions for specialization in college 
or for entrance into higher skilled occupations on the technical 
and semi-professional level. For instance, such persons are 
preferred for officer tiaining in the armed services 

The fifth ligure shows the range and distribution of scores 
on the college (iraduate Record Examination and also among 
the candidates foi teacher-training positions in the Providence 
schools on the National Teacher Examinations over a period of 
the past decade. At this level people are selected for entrance 
to the guuluate schools and the professions 

'I’he arrangement of these live ligures opposite a scale, 
showing chronological and mental ages at the left and school 
achievement in terms of grades at the right, is for the purpose 
of facilitating appraisal and comparison. The procedure is 
somewhat as follows: Indicate by a check mark in the first 
column the person’s chronological age and by a cross his mental 
age. In the second column indicate by a check mark his last 
school grade and by a cross either his average scholarship 
rank as shown by his school record or the battery of achieve¬ 
ment and aptitude tests at that level. This may be done by a 
horizontal comparison of values indicated in the appropriate 
figure. Then draw a horizontal line from the cross on the 
grade measure intersecting the figures representing occupa¬ 
tional levels. Frequently this line will cross two or even three 
figures. If the line docs not fall in the upper half of the figure 
it is possible, and even probable, that the worker will find it 
easier and more profitable to attempt to meet the requirements 
on the lower level In which his qualifications will give him a 
preferred status. 


376 



EDUCAIIOXAI. K1 (jriKl'Ml’-XIS AND OC'C'UI’ATIONAL LEVELS 


This L’liarL IS to he used only in cleteiinming the most 
advantiujcotis pliuf a} cnltanrc inlo an occupational level. It 
helps to ijel a pci.Mill on the pay mil on a icalhlic basis. The 
second step in tlie process is to woik out a program of supple¬ 
mentary and in-ser\iee tiainiiig ;nid experience which should 
help the iiuhrulual to iiiiprove his educational and occupational 
status and thus to earn promotion, after which he may, if he 
so desires, traiisl'er to a joh in a higher occupational level. 

'1 he general use of objective methods in appraising the 
quahlications of a^iplicants and the use of such devices as the 
present chart for pui'iioses ol comparing diherent levels as 
well as in determining the status ol a peison within any group, 
will he an important step in promoting eipiality of opportunity 
and presenting l.ooritism ami group discriminations For 
instance, a leliel worker was assigned to a technical research 
project because he had attended a college for two years and 
exprcsseil an inteiest in such woik lie liked the prestige of 
his ashigiinieiit ami leiiised other kinds of employment despite 
the fact that no such jobs were available to one of his quah¬ 
lications. Tests showed that his piesent educational skills 
weie about those oi the average eleventh-giade pupil and 
among the lonest .1 pei cent id' college sophomoies. 'When 
shown his results on (he chart hv the counselor, he decided to 
enter a ilefense training class and now has a good job as a 
production worker. 1 lis social workei was also glad to have 
the support ol objective data based upon the measurements of 
abilities of the workei ratlicr than upon his observations or 
opinions. 

A similar instance was that of a star athlete who grad¬ 
uated from a high school with the respect of both faculty and 
student body. 1 le wanted to become an apprentice machinist 
and felt that he was rejecteil because he was a Negro. His 
school record showed C’s and ])’s in most subjects. Moreover, 
these marks were obtained in non-college subjects and in the 
slow-learning cla.ss sections. His scores in aptitude tests and 
in the Cooperative tests in linglish. Social Studies, Mathe¬ 
matics, and Science all placed him in the lowest fifth of the 

377 



rnrcATioxAi. and i-syciiologicai, measurement 

dass. Oil the basis of sucli evidence he could be shown that 
many non-Xe^rros would also he rejected even with much bet¬ 
ter qualilications. However, his qu.dilications would easily 
admit him to a ilefcnse training class and his excellent char¬ 
acter and personality record, as well as his physical assets, 
would make him ilesirahle as a production woiker in the same 
plant where he had been rejected as an apprentice Moreover 
his pay as a [iioduction worker would be much higher than as 
an apprentice. In addition, if he still wanted to become a 
machinist, he could enroll in free evening courses in mathe¬ 
matics, drawing, science, and machine shop practice, and when 
he had mastered the necessary lundamentals and skills, he 
could be examined and certilied by the school authorities, by 
the state civil sciwicc, or by the state director of apprentice 
training, and on the basis of such evidence he should be able 
to obtain employment as a machinist. In recent years the 
placement oflice has had rciiuests from a miinber of employers 
for at least a few Negros of superior (gialifications to indicate 
that they have aliaiuloned tlic practiec of race discrimination, 
They were umvilling to employ them or to reject them because 
they were Negros, hut would be glad to employ them if they 
could deinonstrate that they were as well or better qualified 
than others who w'cre being employed. 

Accusations of prejudice and discrimination are the per¬ 
petual alibis of the unsuccessful candidate for any job. The 
only effective answer is the more general use of objective 
methods in determining the qualifications of candidates and 
more objective and accurate production records in determining 
promotions. 


378 



THE PREDICTION OF SUCCESS OF STUDENT 
ASSISTANTS IN COLLEGE LIBRARY WORK 


GRACE M. OBERHEIM 
Iowa State College Libiary 

T HERE are many problems which arise in connection with 
the selection and use of student assistants in college 
library work. The specific problem which led to this study was 
the difficulty of obtaining student assistants in the Loan De¬ 
partment of the Iowa State College Library who were capable 
of doing the required work successfully. It was thought that 
high scholastic grades and high scores on certain selected tests 
would have a positive relationship to successful work in the 
library. 

A testing program was set up at the Iowa State College 
Library during the college year 1937-38. The purpose of this 
program was to discover the extent to which academic grades 
of student assistants and scores made on certain selected tests 
might be used to predict success in various types of college 
library work. The results reported here are based upon data 
obtained for 307 undergraduate student assistants^ who 
worked in all departments of the college library during the 
college years 1937-38 through 1939-40. The predictive indices 
available for this group included the American Council on 
Education Psychological Examination scores, :the National 
Institute of Industrial Psychology Clerical Test (American 
Edition) scores, and the grade-point averages for one quarter 
of college work. In addition scores on the Bell Adjustment 
Inventory were available for 69 assistants who were included 
in the group of 307. 

All students take the Psychological Examination when they 

^The gioup was composed of 174 fieshmen, 86 sophomores, 37 jiimois and 
10 setliois Of this numbei, 71 were women and 236 men. One hundred and 
thirteen students (40 women and 73 men) had had some previous library experi¬ 
ence at the time they took the Clerical Test while 194 (31 women and 163 men) 
had had no previous library experience. “Libraiy experience” as used in the 
study may be defined as more than four weeks of work, usually on a part-time 
basis, in a hbiary. An assistant who has worked four weeks or less was con¬ 
sidered in the group “without libiary experience” 

379 



LDITAIIOXAL AXI) PSYCHOUIGICAI, MEASUREMENT 


enter collcjire. The p-adc leporied in the study was the grade- 
point average" tnade liy the student assistant for the quarter in 
which lie took the (.'Irrical TrU at the libiary The libraiy 
rating was made at (lie end of the same (luarter. 

The criteria of success for student assistants in college 
library work were (1 ) ratings iiiado hy lihiarians who super¬ 
vised the work of assistants and (2) recnuls of student pro¬ 
motions within the library. 

'rive graphic rating scale used was adapted from one de¬ 
scribed by I'ller and O’Rourke (.1) and by Symonds (4, 
6d-(^8), and instructions similar (o those described by Symonds 
were given to each rater. 'Phe slab member best acquainted 
with the student's work and directly in chaige of it rated the 
assistant, 'riie ratiiiff.s were scored hy assigning numerical 
values to the live dillerent divisions of the rating line, with a 
possible range from 0 to 4 on each division and a total lange 
from 0 to 40 on the ten items. 

Only one rating was used in (he study since it was found 
that very often there was no second person eijiially competent 
to make the rating As a measure of (he reliability of the 
ratings used, a second rating was made for two smaller groups 
of assistants included in the group of .207 Twenty students 
in the Catalog Department were rated independently by a 
second rater, and twenty students weie rated by the Assistant 
Loan Librarian as well as by the Loan Libiarian. Reliability 
coefficients of .76 and .77 were obtained. A frequency distii- 
butlon of the total scores made on the ratings for the 307 
assistants was made, and the chi-squared test indicated no 
significant departure from normality. 

To be promoted, an assistant must not only have had some 
experience in the lilirary, but he nui.st have also the ability to 
perform more diffieuU tasks than those assigned to him when 
he began his library work. Student assistants who prove to be 
accurate in their work and who have the neccssaty personal 
qualifications are ellgihle for promotion. The records of stii- 

^See Iowa State College Cat.alop, 1939-40. p. tlS. 

380 



PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


dent assistants who were promoted were obtained from the 
pay rolls. 

The statistical methods used in analyzing the data in¬ 
cluded a study of the significance of differences of means and 
of the significance of correlation coefiicients, and the use of 
regression coefficients. 

Relationship between the Predictive Variables 
and the Library Rating 

Table 1-A shows means and standard deviations of the 


TABLE 1—A 

MEANS AND STANDARD DEVIATIONS ON FOUR VARIABLES 
FOR TOTAL GROUP OF STUDENT ASSISTANTS 


No. 

N.l.I.P. 

C.T. 

Grades 

A.C E. 

P.E. 

Library 

Rating 


■■ 

73.6 

23 

98.2 

24 3 


■ 

19.59 

.69 

22.09 

6 27 

TABLE 1—B 

COMPARISON OF MEANS ON FOUR VARIABLES FOR GROUPS OF 

STUDENT 

ASSISTANTS CLASSIFIED ACCORDING TO LIBRARY EXPERIENCE 

AND SEX 

Group 

No 

N.l.I.P. 

C.T. 

Grades 

A.C.E. 

P.E. 

Library 

Rating 

Men with 
library experience 

73 

74.7 

2.434 

101.9 

25.4 

Men without 
library experience 

163 

69.8 

2 274 

94.1 

22.9 

Difference 


4.9 

160 

7 8+ 

2 5+ 

Women with 
library experience 

40 

83 2 

2.435 

107 8 

26.5 

Women without 
libraiy experience 

31 

78 5 

2.351 

98.7 

26.1 

Diffeience 


4.7 

.084 

9.1 

.4 

Women with, 
library experience 

40 

83.2 

2 435 

107.8 

26 5 

Men with 
libiary experience 

73 

74 7 

2.434 

101.9 

25.4 

Difference 


8 5* 

.001 

5.9 

1.1 

Women without 
libiary experience 

31 

78 5 

2351 

98.7 

26.1 

Men without 
library experience 

163 

69.8 

2.274 

94.1 

22.9 

Difference 


8 7* 

.077 

46 

3.2* 


(indicates that 


381 






KDrc’ATIOXAI, AKD PSVC'IIOLOGICAL MEASUREMENT 


four variables for the tfJtal group of 307 student assistants 
The crimparison of means in 'I'able 1-13 indicates that although 
the women with library experience made higher scores on the 
four variables than the women without library experience, the 
mean difletxnces were not significant. Men with library ex¬ 
perience made higher scores on the four variables than men 
without libraiy experience and the mean differences for the 
Psychological I'ixeimiiialiou ami the Library Rating were found 
in be signilicani. Women witli libiary experience made higher 
scores on test.s and grade.s and were given a higher rating than 
men with library experience, but the mean difference was found 
to be significant for the (Ucucal Tcsl only. Women without 
libraiy experience made signilicaiitly higher scores on the 
Clciica! Tcil and uere given a significantly higher rating than 
men without lihraiy experience. 

Table 2 shows the correlation coeflicients for the three in¬ 
dependent variables with the library rating for the total gioup 
and for the subgroups. The correlation coefficients are positive 
but not high. 


'FAllLE 2 

CORIU'L.VTIO.V C()l'.FriCn,N"r.S of TUREF VARIXULI S with LIRRARY RATING 
FOR OROrrS of .STUDIiMF .assistants CI/.ASSIFlIiD ACCORDING 
TO SI'X AND I.IIIRARY I'XI'liIUhNCL' 




First-order couelation 
coeflicientg 


Group 

No in 
Group 

n.i.i.p.c t. 

Giadcs 

A C.E,P,E 

Multiple 

R 

Women with 
libiaiy expeiience 

40 

.205 

.256 

.293 


Women without 
library experience 

31 

.249 

.479* 

.311 


Men with 
library experience 

73 

.487* 

.250* 

.112 


Men without 
library experience 
Total Group 

1G3 

307 

.351* 

.393* 

.424* 

.377* 

.246* 

.263* 

.456* 


(‘Indicates that the correlation coefficient is significant) 

382 


PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


TABLE 3 

CORRELATION COEFFICIENTS OF FOUR VARIABLES WITH LIBRARY 


RATING, AND INTERCORRELATIONS 
(Sample Fall QuaiCei 1939, N = 69) 


Psychological Cleiicai 
Variable Exam. Test 

Adjustment 

Inventory 

Grades 


Psychological Exam 

698 

-.020 

.570 

417 

Clerical Test 


-.070 

.501 

.622 

Adjustment Inventory 



.0001 


Grades 




.468 


Table 3 shows correlation coefficients and Intercorrelation 
coefficients for a group of 69 students who were given an 
additional test, the Bell Inventory, during the Fall Quarter 
1939. In the hiring and selection of student assistants In 
college library work, the measurement of ability is of prime 
importance. However, assistants must be able not only to do 
the work assigned, but they must also be able to get along 
with people and to have ceitain characteristics such as co¬ 
operativeness and dependability. Bell (1, 102-104) and Tyler 
(5) have reported very low correlation coefficients between 
the Adjustment Inventory and the Psychological Examination 
and the Inventory and grades Since the rating scale used in 
this study contained items such as co-operativeness, initiative, 
and dependability, it was thought that a higher correlation 
coefficient might be obtained with the library rating than with 
the measures of intelligence. However, the correlation co¬ 
efficient of .03 between the Adjustment Inventory and the 
Library Rating is so low as to be of no value as a predictive 
device for the selection and hiring of assistants. Negative 
correlations were obtained between the Adjustment Inventory 
and the Psychological Examination, and between the Adjust¬ 
ment Inventory and the Clerical Test. Bernreuter (2) points 
out that this type of inventory should not be used in selecting 
individuals for jobs since the complete co-operation of the 
individual is essential and this complete co-operation is difficult 
to obtain in a program which involves selection for jobs. 
Further research is needed to discover some instrument which 


383 






LnrC.VriOXAL and I’SVCIKlUKilCAL MliASUREMENT 

will incabLire more satiiifiictoiily the personal qualifications of 
the applicant. Very often it is possible to obtain through a 
carefully conducted intern icw an estimate of personal qualifica¬ 
tions which would make for or prevent the success of an 
applicant 

Standard repres.sinu eoetlieients for the total group and 
for the h'all Quarter group are shown in 'rtible 4, These 

TAHLF. 4 

S'IAN'U;RD RrORI'.SMOK COI'.mCIhN'IS siudvn for turee predictive 


V.\R[.\lU.IiS FOR THIi 'lO'r.AL laiOUP VNU THE F.ALL 
OU.MCIKR 193') OROUl* 


-- _ - — - 


N 1.1,F. 


AC.E. 

(Troup 

No. 

C.T 

Glades 

PE. 

'I’otal 

3(17 

.305 

.265 

-.047 

Fall Quarter 1939 

69 

.593 

.244 

-.135 


Standard regression coeflicients indicate that the Clerical Test 
contributed most to the libraiy rating with grades second and 
the Psyrholu^ical K.Miwiiuilioii third for the total group and 
for the subgroup, Kail Quarter 1939, 

RclaTuinslitp beliveen Prcdlcltvc 
Varutblci and Pi onioliovs 

While student assistants who make average scores on tests 
or average grades and are given an average rating may get 
along faiily well in the library, the avciage docs not represent 
a satisfactory goal. There is constant need for students who 
can do better than average work, and devices which will help 
to indicate the more promising assistants at the beginning of 
their work are important. It may be seen from Table S that 

TABLE 5 

SIGNIFICANCE OF MEAN DIFFERENCES ON FOUR CRITERIA BETWEEN THE 


GROUP 

PROMOTKI) AND 

GROUP NOT 

PROMOTED 


Group 

No, 

N.l.l.P 

C.'I\ 

Giadc.s 

A.C.E 

P.K, 

Libiary 

Rating 

Promoted 

42 

88.26 

2 76 

11238 

2912 

Not promoted 

265 

71.23 

2.26 

95.98 

23,50 

Diff, of Means 


17.03 

.50 

16,40 

5.62 

S. D. of Diff. of Means 

3.25 

.11 

3.77 

1.04 

Diff. of Means 






S. D. of Diff. of Means 

53 

4.5 

4,3 

5.4 


384 




PREDICTION OF SUCCESS IN COLLEGE LIBRARY WORK 


the means of the group promoted arc higher on all the vari¬ 
ables than the means of the group not promoted Not only 
are the means higher but the differences between means were 
found to be highly significant. 

Obviously not all of the students promoted weie equally 
capable. Some of the scores on tests and grades of certain 
assistants in the group promoted fall below the average of the 
gioup not promoted. By eliminating the lower 25 per cent of 
scoics made on the Clerital Tal^ the Psychological Examina- 
lion, and grades for the group promoted, all scores below the 
average of the group not promoted were eliminated. The 
score on each test and the grade which divide the upper 75 
per cent of the group promoted fi'om the lower 25 per cent 
of the group was taken as the critical score and critical giade 
The critical score is 76 for the Cleiual Test and 104 for the 
Psyclwlogual Exanibtation (1938 edition) and the ciitical 
grade is 2.38. 

Since the results show that mean differences between the 
group promoted and the group not promoted aie significant, 
high scores on the tests used and high grades may be con¬ 
sidered to be of value as predictive devices in the selection of 
student assistants for college library work The ciitical scores 
determined foi this group may be used as a guide in selecting 
assistants in the future who would have high promise of 
success. 


REFERENCES 

1. Bell, Hugh M. The Theory and Practice of Personal Counseling, 
with Special Refeience to the Adjustment Invenioiy. (rev. ed ) 
Stanford University Press, 1939 

2. Bernreuter, R. C. “The Present Status of Personality Trait Tests,” 
Educational Ilecoid, XXI (1940), Supp. 160-171. 

3. Filer, H. A. and O’Rourke, L J. “Progress in Civil Seivice Tests,” 
Journal of Personnel Research, I (1923), 484-520. 

4 Symonds, Percival Mallon. Diagnosing Personality and Conduct. 
New Yoik; Century Publishing Co., 1931. 

5 Tyler, Henij^ “Evaluating the Bell Inventory,” Juniot College 
Journal, VI (1936). 353-357. 

385 




THE ADMINISTRATION OF GROUP TESTS 


ERNEST M. LIGON 
Union College 


a A NYBODY with a strong voice who can read can give 
l\. group tests.” Unfortunately this opinion is very widely 
held. Even a superficial consideration of the responsibility of 
a group test situation should quickly dispel such an idea. 
Actually, good group testing is much moie difficult than indi¬ 
vidual testing The most peifect tests available are as value¬ 
less without good examiners as the best surgical instruments 
without good surgeons. 

Among the prerequisites of good group testing aie; that 
all of the subjects understand the instructions, that they all 
work throughout the assigned time at their optimum level of 
achievement, that they are in no way helped, hindered, or 
distracted by one another, that they do not quit trying or omit 
any section of the test, that examiners give instructions ade¬ 
quately and in a stimulating, effective tone of voice—not a 
dull, bored monotone—and that proctors are observing every 
movement in the group, stimulating lagging souls, inhibiting 
wandering eyes, and detecting failure to follow instructions. 
Literally millions of group tests are administered every year. 
On scores derived from them an equal number of judgments 
are made affecting in some way, often extensively, the lives of 
those taking them. 

An examiner giving an individual test can easily determine 
how his subject is reacting to the test problems. A group test 
examiner has that responsibility for as many subjects as are 
In the room, which may range from only a few to several bun¬ 
dled. Now that group tests are being called upon to play such 
a large role in our war effort, it behooves us more than ever 

387 



11)1 ^ .'.M> I'SVCHOl.fltnC'Ar. MIv\SITKEMrrisix 


t(i iiK'ki- tA'ciy t'ili'it tf Jiiu- them eitectively, :is well as to 
oiiistiuet (lit 11 caii'tull; iinti score uiul interpret them ac- 
uir.Ueh Ixeu :ii:ui Viho, due to no fault of his own, makes 
a seoie on I'lie of ihesc ti ts which does not rellect Ins true 
eap.niti, may tlierehy he pul in the Miono place. It does not 
reiiuire pnoinp that in so •.peci.di/ed an nrjpiniyation as the 
modern meehani/ed army, a veiv important ]iart of its success 
depends on ''cttino the lyitht man in the riirllt place 

'I'liis paper has been prejiared on the basis of two types of 
ei iilenee. In the first place, the autlior has had several years 
of experience in administerinir many dillerent kinds of group 
tests as well as iinlivitlual tests. During this period, it has also 
heen necessary to train many students to clo so. In the second 
place, although tlie literatiiie eonlains I’ery little either in 
perioditals or in books on measurement concerning this phase 
of measurement procedure, almost all group tests include in 
their inaiuials ol proeeiture such instructions as seem desirable 
for their administration. .‘\ nuinher of these have been 
exaniiiud and the principles included in them collected and 
orgaiii/ed into this paper. 'I’lie ones used vere selected simply 
heeause they were ones with the admlnistrutloii of which the 
author has had wide experiente. It seems iirohablc that they 
eomstiUitc a fairly representative sample. 


I 

'Flic aim of a group test is to measure difft’icnttally a 
ijrotip of IwmacjciH'oin^ indrSiduals ivilh rnpccl to some 
simple (II cumplcx varinhlc. 

Basic requirements, if scores arc to he significant, presup¬ 
pose that all subjects 

(1) give their optimum porformaiiee, and 

(2) do so for the full period of the allotted time. 

■By /i(imiigc'!!i’»vs is rnciiiil that Ri'iiup tests .arc always adiainisteied to a 
Kiiuip selected heeause all itf its iiicinheis (’.in lie inciisuied with re.srecf to the 
vatiahle iir vanahics involved and by means of the lest being used. 

388 



THE ADMINISTRATION OF GROUP TESTS 


These, in turn, presuppose for the examiner 

(1) that he make perfectly clear to the subjects what 
they are to do and how they are to do it; 

(2) that he stimulate them to do theii best;' 

(3) that he and his proctors note and make adequate 
adjustments for individual deviations, such as 
mental confusion, indifference, impulsiveness, day¬ 
dreaming, making right responses in a wrong way, 
cheating, and the like, which might destroy the 
validity of test results. 

11 

The most common sources of error in group test adminis¬ 
tration and methods for controlling them follow. 

(1) Misunderstoad instniclions 

A very common misconception is that if printed instructions 
are read word for word, this is sufficient to hold conditions 
constant. 'I'he fact is that, unless the test itself consists o| 
instructions, coudilions are held constant only when every sub¬ 
ject St arts to loork with a complete and accurate understanding 
of wlial is expected of hint. Furtheimore, the method of read¬ 
ing instructions is quite as important as their wording, so far 
as the subjects’ ability to comprehend them is concerned. All 
instruction manuals ought to be maiked as to emphasis and 
pauses, as well as carefully worded. Such a practice would add 
substantially to the accuracy of group tests. 

The speed of reading instructions should be a function of 
the speed of comprehension of the subjects. The alert ex¬ 
aminer, by watching his subjects, can know when his subjects 
understand what he has just said. Some instructions, therefore, 
can be read more rapidly than others, and with some groups of 
subjects more rapidly than with other groups. The time for 
reading should be a little more than adequate for all subjects. 

Enunciation is important, considering that in most large 
groups, such as in the army, several “dialects” will probably 
be represented. All must understand. 

389 



i.iirt’AnnxAi, ANn rs'iciioi.tKJK’Ar, measurement 

The iuulitfHy acuity ul' the siihjecLi. must be considered. 
Itistructinns arc imue cerfiiti tn he understond if the subject 
can read I hem ■lilcntly us tin* CNamincr reads them aloud, thus 
cinplia.si/intf hiifli visiiul and .iiulitorv cues. Conversely, how¬ 
ever, the examiner must read them aloud. To ask subjects to 
read instnu tions silently wiilimil oral readiiiff by the examiner 
almost alw ays results in errors in readinjf and even failure to 
read all of them. I'urtliernmro, as previously indicated, the 
emphases indicated by tlie reader help in understanding. Visual 
illustration of the meelianism of recording anSAvers will help 
to avoid many clerical errors. 

Delayed reaction instructions ought to be repeated near 
the time during the test when they are to be carried out In¬ 
structions about what to do when a certain part of the testis 
reachetl are almost certain lo be forgotten by some subjects 
unless they are rcmitided of them. 

Proctors sliould cheek to he sure that all of the subjects 
do understand the instructions, by seeing, for example, how 
they answer the lirst two or three tjuestions, usually simple 
ones which all ought to answer. 

In some te.sts, ability to understand instructions is part of 
the test. If this is to be the case, there should be a great many 
different instructions separately scorable. Otherwise, failing 
to understand them may produce an undistributed minimum. 
Occasionally in screen tests this may he a desirable condition. 
It is also true, at the other extreme, that if instructions are so 
complete as to constitute answers to the test questions, there 
may be an imdi.strlbutcd maximum. It seems umvise, however, 
to include in the time limits of a test nonscalable sets of in¬ 
structions. 

Added afterthought dircctioms given in the middle of a 
test by an ex.amlner who has not adequately prepared his 
instructions beforehand constitute an important source of 
distraction. 

The amount of practice necessary to make instructions 
clear to subjects will vary among various groups. It needs to 

390 



THE ADMINISTRATION OF GROUP TESTS 


be adequate for the poorest of the subjects. One hundred per 
cent understanding is necessary if subjects are to be measured 
accurately. 

(2) Careless Errors 

Speed and accuracy are two distinct qualities. Subjects 
should know how much each is to be weighted in the scoring 
of a given test. For example, a score based on right minus 
wrong on a clerical aptitude test can be raised appreciably by 
working very rapidly even though the number of errors is 
thereby increased. However, most employers of clerical work¬ 
ers would rather have employees who get sixty correct answers 
out of sixty attempted than eighty-five right out of one hun¬ 
dred attempted, although the score of the latter is higher than 
the former. 

Persuading subjects who finish before the time is up to 
reread the questions and check the answers is a help in avoid¬ 
ing careless errors Pioctors need to be alert to the possi¬ 
bility of subjects overlooking large sections of the test. 

Whether or not subjects are to be encouraged to guess on 
multiple choice questions should be standardized. It is com¬ 
mon procedure for examiners to warn the subjects that a 
wrong guess counts off more than an unanswered question 
They neglect to add that a right guess counts more than an 
unanswered question. 

Careless errors which are a result of the faults of the 
examiner need to be watched for. Timing errors are the most 
common of these. A full-face second hand is a necessary part 
of a group testing time-piece Extra pencils need to be at 
hand, so that securing a new one from a proctor does not 
require a large amount of time. If subjects have two pencils 
to begin with, the possibility of this difficulty is decreased. 
Pens should never be used, since the inevitable corrections 
made by a subject become increasingly difficult or even impos¬ 
sible to interpret. 

Correct filling in of the forms on the front of the usual 
test blank Is difficult. These need to be kept at a minimum and 


391 



nil'CA) tf)NAI AM) I>S\('ll(IF.(K;irAL Ml'ASrRIiMENT 

filled DiJt ‘vV.stem.Ua.dlj’ under infitnictions from the cxaminei 
riu‘ date simuld lie si.itul or written in a piominent place in 
the frniit iif the ronni. 

DiMraetions are to be eliminated as far as possible, A 
distraction is a subjcelise eonrept. It is whatever distracts the 
suhjeet. Too (piicl a ronni may often he more distracting than 
one which is noisy. Visual distract ions are usually more impor¬ 
tant than auditory ones. IVnple walking by where they can be 
seen, proctors too obvious observation, neighbors turning to 
later pages of the test too soon, some subjects leaving the 
loom too early, and excessive materials on desks are a few of 
the nuue coniinori ilistractions. 

(d) Low MiUivtrtirni 

If group lest scores are to be adequate measures of what 
they try to measure, they presupptjse that the subjects do their 
best for the full time limit of the test. Unless the test meas- 
ure.s motivation, maximum motivation is a prerequisite foi 
accurate scores, 'riicre arc at least three types of motivation 
tvhich, when characteristic of group test subjects, tend to ^ 
decrease the reliability of the results. 

(a) iSciise of madcqiuicy. Subjects often see that there 
are many problems on the test which are impossible for them 
to answer and infer therefrom that they are failing the test 
and so give up without trying, d'his i.s due to their experience 
with school examinations, in which they are expected to know 
all of the material asked for. A statement of the nature of 
mental tests will often remove much of this misconception, 
eniphasi'/.ing especially that a good lest must be long enough 
and sufficiently difficult that the best subject cannot make a 
perfect score. Then, too, a statement to the effect that too 
high a score is quite as had as too low a score for a subject 
will help, emphasizing the fact that the accurate score is the 
only good one. 'rhis source of error Is e.specially characteristic 
of achievement tests in fields in which the subject has had no 
formal training, such as mathematics and science, if that is the 
case. The subject often gives up without trying, whereas a 

392 



THE ADMINISTRATION OF GROUP TESTS 


genuine effort would often produce astonishingly good scores, 
even if the subject has had few, if any, formal couises in 
these fields 

(b) Sense of indifference. Many subjects may have the 
idea that the test is not important and that it does not make 
any difference what they make on it. There may be initial in¬ 
difference, or a jack of enthusiasm for even starting a test, and 
there may be executive indifference, or a decrease in enthusi¬ 
asm as the testing period progresses. 

Overcoming initial indifference depends on (a) the atti- 
mdinal preparation of the subjects for the test, and (b) the 
attitude inspired by the test examiner in the beginning of the 
test. 

As subjects are prepared for a test, they should be told the 
purpose of the test; what it tests and how one can know that 
It tests it. A brief statement Is often very effective, pointing 
out that tests are constructed by experiment and not by arm¬ 
chair theory and should be criticized only when experimental 
data are available. This is especially important with highly 
intelligent subjects. A discussion of the nature of direct and 
indirect tests, with a clear statement of which type Is being 
administered, is valuable, If a subject knows that he cannot 
predict the right answmr and that he may only destroy his 
chances of getting a good score by endeavoring to do so, he is 
less likely to try to answer the questions in whatever way he 
thinks may get a good score instead of giving straightforward 
answers 

The subject should also be informed as to the use to be 
made of the results. If they are confidential, to be given to 
no one except him or with his permission, his attitude is im¬ 
proved by assuring him of this fact Honesty and frankness 
with the subjects is almost always an asset in getting good 
motivation. 

Then, too, the attitude of the examiner is important. If 
he by his posture and tone of voice indicates that he Is bored 
by the whole procedure, he will probably inspire this same 

393 



^^■!) l'S^<'n((T.()<;U‘AT, MEASUREMENT 


h) h.s If lie stamls erect and alert while 

jv.iJinK tlsrriJiniis. e.inl speaks with a ttme of enthusiasm in 
his vioic Hiiish sutrm'sts that he thinks the test is interesting, 
the tiictf (ill his suhints will he appreeiahle. Just as in every 
indjsiUii.i! test eveiy snhjeU is the “iiinsl important" suhiect, 
so in tests, evetj group is the "most important" group, 

A pom! exattiiner never lets down. 

'File intensity of the examiner's voice needs to be con¬ 
trolled. I Imvcver, stihjeelive intensity is not always measured 
in deeihels. It is a well known principle in public speaking 
that tti lower tlie voice both in pitch and loudness is eSecfivc 
in getting attention. 'Fids probably is due to the fact that it 
Is the very opposite of the eoinmon procedure. An incise, 
linn, ringing .signal "go" does much to produce good inltiil 
motivatiim. When loud-speakers are used, their value lies in 
llie fact that mure olijective inten.sity can he gotten without 
greater apparent inten.sity on the part of the examiner In 
any case, the attitude arul lussture of the .subjects should be 
one of alert uttention at the signal "go.'’ 

Overcoming executive iiulillcrcncc is even more difficult 
than overcoming initial indilfcrcnce. 1 he most important fac¬ 
tor in this respect, aside from the personality traits of the 
subjects themselves, is the internal nature of the test. la 
young children, every test needs to be put on a game level, 
in order to elicit best elforts. If tests arc of any considerable 
length, this also liolds true even for tho.se given to adult 
groups. Mven when the subjects have the best intentions and 
the most complete awareness of the importance of the results, 
it is dilTicult not to let down with continuing boredom. Test 
construclicm and the organi/ation of test batteries ought to 
he based cm this inherent factor and provide^ for the inclu¬ 
sion of interest stimuli at frequent intervals. I est leliabilities 
would thus be improved. A spirit of competitiveness, i not 
overdone, is of value in executive motivation. It ^ 
geared to the type of subject with whom the test is use , ut 
is always stimulating when used wisely. 



THE ADMINISTRATION OF GROUP TESTS 


The attitudes of the examiner and proctors, even during 
the times when the subjects are writing and no instructions are 
being given, are still a factor. If they relax and slouch aiound 
in groups for non-test conversation, this will carry over to the 
subjects. Examiners and proctors who are alert during the 
whole testing period have an important role in the maintenance 
of motivation in their subjects. 

Mental fatigue is largely a product of imagination. Some 
groups become tired after fifteen minutes. Others will con¬ 
tinue at full speed for several hours. Subjects ought to be 
warned, in view of the importance of the results, about the 
shortsightedness of allowing fatigue and boredom to decrease 
the quality of their effort. They should be informed of the 
facts concerning the nature of mental fatigue and as to how 
long it is possible for the human mind to persist at a high level 
of effort and efficiency. 

(4) Mental confusion due to too great excitement 

It is possible, especially with some subjects, to get too great 
motivation as well as too little. The subjects need in the very 
begimiing to be put at their ease, without a loss of desirable 
motivation. The methods employed by every good individual 
tester can with modification be applied in group testing. Thus, 
an individual tester adjusts the speed and intensity of his voice 
to the speed and intensity of his subject. If the subject is 
dawdling he can speed him up by a slight increase of these 
qualities in his own voice. If the subject is obviously too 
excited, he can quiet him down by the slower speed and lower 
intensity of his voice. This can be done also by the effective 
group tester. 

Two common causes of overexcitement can be diminished 
by the good examiner. If the subjects worry about the tests a 
long time in advance, it may be wiser not to forewarn them at 
such long periods. No forewarning at all, on the other hand, 
might produce a complete mental disintegration in some sub¬ 
jects. Instructions which read “work rapidly,” if given with 
too great fervor, may sometimes decrease the efficiency of cer- 

395 



t i'> « \nMNi,S ’.Ml l'M( ilitj Iii.u M MKASrKhMENr 

t.iiii Mjh;ri<s. ALrr. I'.-ipIi i.sinit't nosk cll under pressvue 
t<(' fin:(' nr in Ati; J. I in i :-ii inui-i. aiul inspired confidence 
III the ;■ i nj inilill nn .‘k'.Mvuss in his subjects without 

itvurt \ 4 iting fiu'ru. 

When l!u' instuh Situis nr die paniiiliennilia, such as 
Jii.ti'hine-sutniig rt|ir.|'.nu':il. are tmi cMiii[)liruteil, plenty of 
time iieetis [n be taken In uiiiiiiaii/e the subjects with them 
tn insure mnliiiesut in their use. 

f)hser\'ing aru! inpybuj: luic's luighhur’s work is only one 
uf the fdtins ul ehi afitig (huie Ia svihjeets on group tests Itis, 
furiiKM'tnnre, the e.isiiM to enntrnl. It ihoukl simply be nude 
ililpfissihle. IIiuMr st^telM^ sh'iiihl neu'r be used in group 
testing Stine relniiee si.tring cin he intule Invalid by even a 
few tlishdiiest sidijeets. Bill ilu-rc tire other forms of dis- 
honesly. Ib'celiealing is very enininnn Often people come to 
psychologists to "get a eopj i»f nil the tests to be hnd" sons 
to he ready fm sume eniiiiiuf !>riMi]i tei.ts. Obviously, most 
tests eniinot he prepared lor. I!ul siilijeits iitlempting to do 
.so .siiow hy their allitiule llieir nnwlirmgiie.ss to give the most 
desirahle type of co-operation, It .stniuls to reason that to 
whatever exleiit :i suiijeet siieeeeds in this sort of effort, he 
destroys the value of the lest to him us an accurate indication 
of his ability or aptitude. 'I his type of cheating can best be 
eliininatcd hy tlie process of urging upon the subjects the fact 
that a high score i.s a had score unless it is accurate 

(jetting information coneeniing the tests fiom individuals 
who liave taken them is aimtlier dishonesty source of error. 
Subjects ought to he informed of the possible consequences 
which may arise, from such action. If, for example, by this 
means one succeeds in getting into the air corps who would 
have been rejected if propcrlj' te.sted, and is killed in training, 
his informer can luually he tlioiight of ms having done him a 
favor, 

Such test procedure inctlunls as involved in requiring 
"pencils up” between tests and “folding back booklets so that 


396 



THE ADMINISTRATION OF GROUP TESTS 


only one page at a time is visible” are designed to eliminate 
errors belonging in this classification Clear instructions as to 
whether it is permissible to proceed to another page or go back 
to a former page without further signal ought to be given 
both orally and on the printed page. 

Many subjects get help from the examiners themselves. 
Some examiners indicate correct answers in their tone of voice. 
Others will answer individual questions which may give the 
questioner an undue advantage. Such questions, if answered 
at all, ought to be answered so that all can hear. An example 
of how the examiner’s voice can help the subject is found in 
giving the digits test in the Binet If the digits are grouped or 
read too rapidly the test is much more easily passed. 

It is well for the examiner to have a correct attitude con¬ 
cerning the nature of his job. The job of a teacher is to help 
his students learn. The job of an examiner is to measure, not 
to teach. 

(6) JYo/ working full time 

If time is a factor, a test should be so constructed that 
the best subject cannot finish within the prescribed time limit. 

It is common procedure in achievement tests to give ade¬ 
quate time for even slow subjects to finish. This results in 
the best subjects finishing early If they leave the room, this 
becomes a serious distraction to the slower ones. It might be 
desirable to include non-scorable questions to prevent this from 
happening. It is true that holding subjects after they have 
finished is usually not good for testing morale 

It IS difficult on long tests for subjects not to let their 
minds wander from time to time. This is a factor for the 
proctors to deal with. If the proctors arc alert, both by their 
presence and their active efforts, they can keep the subjects 
working consistently. The discussion of executive indifference 
IS also related to this point. Subjects, of course, should know 
whether or not it is a timed test, and in the case of long tests 
should be warned at regular intervals as to the amount of time 
consumed or remaining for the test. 

397 



IIM tAIIOXAt AM) I’.SVCtlOlAKiK’AL MEASUREMENT 


Another factor which enters into the timing problem of 
tests is that of mental set. When long tests covering entirely 
difterent type.s of material arc given successively, a longer 
interval needs to elapse between them. This is not due pri¬ 
marily to a fatigue factor, but to the need for changing the 
mental set from one licld to another. Teachers who have 
taught two different courses in successive hours will recogniie 
the importance of this principle. 

(7) iV/Ar of (jrotif) aud group iiilcr-disit actions 

Ciood morale in a group is essential to maximum perform¬ 
ance, How large a group can be before distracting factors 
enter in due to si/e, varies. One group of four hundred can be 
tested better than another group of fifty. The constitution of 
the group is a factor, as is the ability of the examiner. When 
the membcis of the gioup do not know each other, morale is 
more easily maintained than when they do, unless intergroup 
rivalries can he used as a motivation, (iroups of older age 
levels are usually easier to control than younger groups, 
tiroups competing with each other have better morale than 
those having no such sense of group solidarity. Telling a new 
freshman class or group of draftees that they are competing 
with preceding chnsscs or groups is an incentive for group 
morale. 

Once group morale is lost, it is very hard to regain. Let 
there be a few sighs, whistles, groans, shufflings of feet, low- 
intensity grumblings, or catcalls and the situation for good 
group testing is almost hopelessly lost. The leadership of the 
examiner and the alertness of the proctors will play a large 
part in this, 

III 

This paper has attempted to indicate the difficulties in¬ 
volved in the administration of group tests and to point out 
some methods for making them good measures of the variables 
involved. Every subject ought to leave the testing room feel- 
'Ing confident that he has done his best, and that the score 


398 



THE ADMINISTRATION OF GROUP TESTS 


assigned to him will be representative of him, even if he has 
missed a large percentage of the questions. One does not pass 
or flunk tests any more than he passes or flunks measures of 
height and weight. It is the task of the examiner to make 
this clear to each subject and get from him a sample of his 
best performance. More thorough training of group testers 
and a larger sense of their responsibility among them will 
make the increasing use of group tests a far greater contribu¬ 
tion to the problems of adjustment than if the common notion 
that “anybody can give gi'oup tests if he reads the printed 
instructions word for word” continues to be the prevalent one. 
It will be obvious that not all of these principles will be ap¬ 
plicable to all group tests, but it should be equally obvious 
that administering any group test is difficult and, when well 
done, constitutes a highly skilled act 


399 




THE PURPOSE, ORIGIN, PLAN OF PROCEDURE, 
AND VALUES OF THE NATION-WIDE EVERY 
PUPIL SCHOLARSHIP TESTS- 

H. E SCHRAMMEL 
Kansas State Teacheis College 


Purpose 

I N the field of measurements and the objective testing move¬ 
ment, the Nalion-Wide Every Pupil Scholarship Test is 
one of the major significant developments Because of the far- 
reaching influence of these testing programs in this lespect, it 
was felt that it would be worth while to recount before the 
membership of the National Association of Teachers of Edu¬ 
cational Measurements the major details of their purposes, 
origin, methods of procedure, and values. 

The purpose of the Every Pupil Scholarship Test is the 
promotion of scholarship. They are a valuable agency for 
stimulating scholastic endeavor on the part of the students 
They stimulate good teaching as well as application to better 
learning. They vitalize education and make schools more 
worth-while in the lives of the students. 

Origin 

The Nalion-f-Vide Every Pupil Scholarship Tests spon¬ 
sored by the Bureau of Educational Measurements of the 
Kansas State Teachers College of Emporia had their origin 
twenty years ago in connection with the county and state 
Scholarship Contests sponsored by this college 

The first county contest in academic subjects, of which we 
find a record, was conducted by the Bureau of Educational 
Measurements in 1922 in Cloud County, Kansas. The first 

^Paper read at ineeUng of February 24j 19+2, in San Francisco. 

401 ' 



LlirCATKlNAL AXI) lASYCIlOLOtilCAL MEASUREMENT 

State vScholarsliip Contcbt on reemd was conducted by the 
Kmporia State College in 192d. 

For a time the county contest movement was very popular 
and the state contest movement also developed at a marked 
rate. 'Fhe latter is still a popular event in Kansas. This spring 
the twentieth annual vState Scholarship Contest will be con¬ 
ducted hy the Fmporia State College at thirty conveniently 
located centers of the state. Last spring over 3,500 students 
from approximately 20(1 high schools participated in this 
event. 

In the county contests at first only a few of the best pupils 
participated fi oin each school, I leiice the suggestion was made 
that a plan he devised which would stress excellence in achieve¬ 
ment of the entire class in a curricular Held. Thus in the spring 
of 1924 two schools coiulucled a contest in one subject which 
involved a larger numher of pupils from each school, each 
set of pupils taking the tests in their own school. The test 
papers wore provided aiul scored hy the Fmporia State Col¬ 
lege. This was known as a ilual contest. During the 1924-25 
school year there was imieli demand for objective tests for use 
in similar inter-school contests in which every pupil in one or 
more specified subjects of each of the competing schools par¬ 
ticipated. Because of the incieased demand for new tests for 
this purpose, a plan was devised lor announcing in advance 
the subjects and datc.s for which tests would be made available 
for inter-school competition. During the first year that this 
plan was in operation, many schools used the tests for inter- 
school competition in which all the pupils of each school par¬ 
ticipated and the median score was used as the measure of 
comparison. A few schools, however, were not matched with 
any other schools for coinpctiuon, but they desired to use the 
tests in order to be able to compare their results with the re¬ 
sults of the other schools for the purpose of determining the 
relative excellence of their own classes. Hence norms were 
computed from all scores in each subject and provided to all 
the participating schools. Thus the Every Pupil Contest idea 

402 



PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 


soon was superseded by the principle of a testing program in 
which schools voluntarily participate in order to obtain an 
objective measure of the attainment of pupils and classes. This 
is the plan that has been retained in the main, with the intro¬ 
duction from time to time of valuable perfections and improve¬ 
ments. 

Plan of Procedure 

At present the plan of procedure of the Nalion-Wide 
Every Pupil Tests is as follows. The Bureau of Educational 
Measurements annnally announces two dates for the testing 
programs. These come at the close of the first semester and 
near the middle of April. Bulletins are sent out giving the 
list of subjects for which new tests will be provided for each 
testing date. This year thirty-four new tests were provided 
for the testing program scheduled for January 8, and forty- 
four tests will be provided for the next testing program an¬ 
nounced for April 8. Approximately 1,000 schools of the 
country obtain tests at mid-year and 1,500 for the end-of-year 
Test About three-fourths of a million copies of the tests are 
used annually. 

The Bureau secures competent volunteers to construct the 
tests. These consist usually of teachers in Kansas and else¬ 
where who are well trained In their respective curriculai fields 
and who have also had some training in the field of measure¬ 
ment. The tests are edited at the Emporia State College by 
test construction and curricular specialists The printing is 
done in the college print shop. Several dozen student assistants 
are employed In the ofSce of the Bureau and in the print shop 
to handle the routine duties of typing, proofreading, filling 
and shipping orders, summarizing scores, computing percentile 
norms, invoicing, and keeping accounts. 

As test orders are received from all parts of the country, 
norms are computed from the scores reported by the partici¬ 
pating schools for each curricular field both for the whole 

403 



IDI’C\Ilfl.\\I AND MHASURKMENT 


j^rroiip am! also sci'arak'lv lur iiidividual states from which a 
sulliiient numlu-r iil scnres aie reported to warrant it. 

A sutnmaiN ludletm of lesidts is printed in compact foim 
and furiiislual pralis to all pailicipatinfr schools within three 
ueeks after the sehedidnl testiiijf date. 

Tor mu- of die I emit Ifvery I’upil Scholarship Testing 
l’rog;rain,s it wav (omul (h.it (he process of computing the 
measures n'ported in the .Siiniinaiy Bulletin of Norms entailed 
the h.uidhng of lt)d,112 pu[)il and class scores, the construc¬ 
tion of d(ls fu‘i]uenc\ tables, .uul the calculation of 3,429 
statistical ineasuies 'I'he norms computed are based on from 
■sewial thousand to o\-er ten thousand pupil semes for each of 
the carious school subjects uiul grades for which the tests aie 
provideii. 

J'dlulUy tind RvluihiVity 

What inediod is used to insure that llie tests possess ade- 
(]uate validity and reliability, the most ini|iortant ciitcria for 
evaluating le.sts, is a iiueslion woitliy of consideration at this 
point. While it is not chiiined tiiat tlie tests arc fully stand- 
ardi/,t‘th they ilo eompaie f.ivonihly in lliese respects with the 
better staiuhirdi/ed puhlication.s. 

h'or insuring ^•alidity the following precautions are taken; 
I'irst, as a rule the test builders aie persons who teach classes 
in the curricular fields covered by tlie tests and who therefore 
have a good perspcctix'e of the content to be included Second, 
content studies are made of textbooks and courses of study and 
the test items budgeted in accoidance with the content distribu¬ 
tion. Third, the editors consist of test construction specialists 
and supervisors ami teacfiers of cuniciilar fields. Fourth, 
cumulated studies of pupil responses on test items over a period 
of years are available and used, h’iflh, cumulative criticisms 
from teachers who have useil the tests over a period of years 
are available and utili/,ed. Sixtii, in felds where studies from 
previous editions of the tests are not available, preliminary 
editions are provided and tried out in representative classes. 

404 



PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 


For Insuring reliability, studies are made with preliminary 
editions and on tests provided ovei a period of years. In a 
field where tests are regularly provided, the degree of relia¬ 
bility may thus be predicted with a fair degree of accuracy. 

The results are used extensively by teachers, principals, and 
pupils. Many expressions are annually received fiom schools 
as remote from the center of this movement as Montana, 
Florida, Texas, Maine, and California. Teachers are eager to 
learn how their classes rank in comparison with the classes of 
dozens of other schools in which the tests were administered 
on the same day during the current school term Moreover, 
they want this information without much delay. It must be 
available promptly during the current school year to be of 
maximum value to them. Thus far we have been able to live 
up to the goal of mailing the results to the schools within 
three weeks from the day the tests arc administered. 

Pupils, too, are eager to note how they rank m comparison 
with the pupils in their own and other schools and they want 
this information before it becomes ancient history. By pro¬ 
viding objective measures which can be simply and intelligently 
interpreted, pupils are motivated to work for greater excel¬ 
lence in achievement in the various curricular fields. 

Many principals conserve the test results from year to year 
by filing the cumulative record of each pupil on a convenient 
card which has been provided Because all scores are similarly 
interpreted, this provides a wealth of material for use in coun¬ 
seling and personnel work. Some schools also issue certificates 
of excellence to pupils whose scores receive a high percentile 
rank. In this manner excellence of achievement is further 
stressed and motivated. 

Values Accruing from the Plan 

The values accruing from the Every Pupil Scholarship 
Tests are manyfold These may be roughly classified as pri- 

40S 



M» CAnoN'AI, ANP I’SYCUOI.OdU'AI, MEASUREMENT 

inury ami as stcomlary values. The major primary values are 
the iolimvinf': 

A. The plan of Kvn\ Pupil Sciwlatihip Tesis stimulates 
an inti'llijrffit iiiu-rpretation of test results. For this 
purpose, iKTri'iitile stores are provided for each subject 
ami ('nule. Simple instnicliniis are (riven for interpret- 
in|,r d,iss median scores, as well as individual scores, 
into correspomlin|r percentile scores. Because many 
dilfcrenl metlunls of interpretiiijr .standard test scores 
are resortetl to, teaeheis aie frequently at a loss in 
rc(rard to the procedure in makin|r correct interpieta- 
tions. All too frequently valuable results from the use 
of standard tests arc misinterpreted or not interpreted 
at all. Through our process of education in this re¬ 
spect over a period of years, many teachers and piin- 
eipals have c\lubitcd that they have learned to make 
correct and meaniniyful interpretations of their results 
hy the percentile score method and that they like the 
simplicity of this method, 

B. The (dan motivates (nipil and class effort because the 
results are ohjective ami the interpretation is intel- 
liffible not only to teachers but can he presented graphi¬ 
cally and intellijfcnlly to inipils, 

C. d'hc plan motivates tcacliers in the construction and 
use of better home-made tests. 

D. The plan motivates teacher effort in hcttei planning 
of instruction, lindinjf of weaknesses of instruction, and 
so on. 

H. The plan motivates diagnosis of weaknesses and 
eflicient remedial work in instruction, 

F. The plan challenges the teacher to set up outcomes 
objectives and to look for methods of determining the 
extent to which such outcomes arc realized. Too fre¬ 
quently teachers arc content in the assumption that 
valuable itilangiblc outcomes accrue from their instruc¬ 
tion, when in reality this may be far from the true 
conditions, By being consistently exposed to objective 
measurements, it is hoped that in time they will become 
skeptical of these assumptions and seek to evaluate the 
actual lasting values of their efforts. 



PLAN OF PROCEDURE OF NATION-WIDE EVERY PUPIL TESTS 

Among the secondary values, the following are a few of 
the more obvious: 

A. The plan aids the test builders to become more pro¬ 
ficient in devising more efficient and valuable tests. 

B. On the Emporia State College campus the plan aids 
several dozen students annually who need employment 
to finance their college education. 

ET': 

C. For the Measurements classes on the campus, the plan 
provides an invaluable laboratory. All of these stu¬ 
dents are in some concrete measure exposed to test 
production, standardization, use, scoring, interpreta¬ 
tion, and so on. 

D. The plan affords unusual opportunity for teaching stu¬ 
dents in Measurements classes and other employees the 
use of mechanical devices in handling statistical and 
other data. For example, the Bureau office contains 
hand and electric calculating machines, comptometer, 
clip boards, postal rate scales, postal wrapping device, 
and Dictaphones. The college print shop is equipped 
with linotype, rotary pi-ess, folding machine, stapling 
machine, and other equipment essential to a modern 
printing establishment A large number of students 
receive first-hand experience m the operation of these 
devices In connection with their employment made pos¬ 
sible by the Natwn-Wide Every Pupil Scholarship 
Tests. 

E. The plan of the Every Pupil Testing Programs makes 
it possible to standardize more and better tests than 
would otherwise be possible. Where normally scores 
for norms would be difficult to obtain, and at consider¬ 
able cost, a much larger sampling is possible for the 
norms and at practically no cost. This makes it pos¬ 
sible to pass the advantages on to the patrons in terms 
of more and better up-to-date tests at an unusually low 
cost of production. During the writer’s directorship 
of the Bureau of Educational Measurements, fifty- 
seven tests have been standardized. Most of these are 
published by the Emporia State College, but a few 
are published by some of the other leading test pub¬ 
lishers. 


407 



i-nrc\\ilONM. AND rsYC'Hoi.ocicAr, measurement 

F. For iiKuiy schools, participation in the Natwn-JFtde 
E'Vt'i \ I’lipil Si Imltii Hilp Tt\\i\ has furnished excellent 
material for loc.tl scliool publicity In this mannei tax¬ 
payers and patrons aic made aware that their schools 
are not only seekinff to excel in the so-called extra 
school work, hut also in the regular ciimcular fields, 

(i. Through use of these tests, capable pupils, who other¬ 
wise might he eontenl to terminate their education upon 
completion of junioi oi senior high school, make a 
iliscovery of their ow'ii scholastic potentialities and are 
inspired to seek further development in college. 


408 



A TEST FOR SELECTING AND TRAINING 
INDUSTRIAL TYPISTS 


CLIFFORD E. JUROENSEN 
Kimberly-Clark Corporation 


T he Typing Ability Analysis reported here was de¬ 
veloped to assist in training typists for Industrial posi¬ 
tions, and to assist in hiring typists who can adequately fill 
such positions. It was developed after tests commonly used 
in high schools and business colleges had been found to be 
valueless in predicting typing success in stenographic and 
secretarial positions in Kimberly-Clark Corporation. 

Analysis of reasons for the failure to predict job success 
by means of customary typing tests indicated that such tests 
do not emphasize sufficiently the major factors found in the 
industrial situation. Most typing tests provide an adequate 
measure of the mechanics involved in speed and accuiacy when 
typing from printed copy. They fail to measure the mechanics 
of handling paper, placement of paper, use of tools, etc. They 
also fail to measure the non-mechanical aspects of the job 
which are the major factors in differentiating between suc¬ 
cessful and unsuccessful typists. Important non-mechanical 
aspects which should be measured include following instruc¬ 
tions, noting and correcting errors, and the typing of diverse 
kinds of material rapidly, accurately, and in good form. Usual 
typing tests emphasize straight copying, and neglect the mosaic 
involved in a composite typing job. 

Construction of the Test 

Job analyses were inspected and conferences held with 
supervisors of industrial typists to determine the kinds of 
typing most often done, errors most often made, characteristics 

409 



KmU’/VnoNAI. AND PSYC'IIOUKJICAI, MEASUREMENT 


which dlifcrentiate bclweeti successful and unsuccessful typists 
etc. Supplementary data weie secured by inspection of the 
corporation’s files, and confeicnees with private secretaries 
and others in key typing positions. After analyzing and classi¬ 
fying the data obtained, files were inspectetl to obtain represen¬ 
tative samples of the various kinds of typing, 'I'hese samples 
Avere modified so they would be suitable for a test of typing 
ability, modification consisting primarily in the use of fictitious 
names and addresses. Standard typing procedures were sub¬ 
stituted for those unique to the corporation concerned, in order 
that persons unfamiliar with proceilurcs within the corporation 
would not be penalized unfairly. 

After collecting and adapting twenty samples of different 
kinds of typing, instructions were prepared for each. Extreme 
care was taken that these instructions emphasize factors pre¬ 
viously found important in differentiating between successful 
and unsuccessful typists. I'lic preliminary test form of twenty 
items was administered to typists ranging from those known 
to be unsatlsfactoiy to highly successful private secretaries, 
Tests were administered individually, and a record was kept 
of the time required to complete each test part, errors made 
in each part, comiuents on the test before it was scored, and 
comments of the typist after she had been told how her test 
results compared Avlth those of other persons. This procedure 
was followed with a group of thirty typists half of whom were 
fully experienced high caliber stenographers or secretaries, 
and half of whom were inexperienced typists in the mailing 
department. Unanimous agreement among supervisors ac¬ 
quainted with each typist was required for inclusion in one of 
these two groups. Although the number of girls included in 
each group was small, the number was considered sufficiently 
large to warrant pieliminary modification of the test. As a 
result of this tryout, the entire Lest was extensively revised, 
and seven of the twenty work samples were eliminated. The 
revised test was administered to a group similar to the one 
first used, the procedure and conditions of administration re- 


410 



A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


mainlng the same. Six additional work samples were eliminated 
on the basis of low validity coefficients or high intei correla¬ 
tions with remaining parts Accumulation of additional data 
has subsequently resulted in elimination or modification of 
other parts, and the present test consists of five carefully 
selected work samples. One of these is used as practice mate¬ 
rial so that the final score is based upon four selections. These 
parts are described later. 

Considerable attention was given to developing a test 
which not only has a high validity coefficient, but which also 
appears valid to persons to whom the test is administered and 
to supervisors of such persons. It has been the author’s 
experience that apparent validity is of equal importance with 
statistical validity. A test must have both if it is to be used 
successfully for industrial selection and training. 

Development of Error Scores 

The test was originally scored for errors in such a way 
that the test situation was as comparable as possible with 
actual work situations. Test results were examined from the 
viewpoint of whether oi not similar work would be accepted 
by supervisors if submitted by an employed stenographer 
Penalties for errors varied in direct proportion to the time 
required to make the work acceptable. No penalty was given 
for errors which did not affect the acceptability of the work, 
such as neat erasures. Errors of such a nature that the item 
would have to be retyped in order to be usable were penalized 
in proportion to the time required to type that item. This was 
subsequently modified so that the penalty was in proportion 
to the time required to retype the item inasmuch as it was 
found that the retyping time was not proportional to the 
original typing time. Errors that could be corrected so that 
material could be used in an actual work situation were penal¬ 
ized in proportion to the time required to make the necessary 
corrections. 

Statistical analyses showed that the use of maximum 
penalties reduced the validity of the test by preventing differen- 

411 



EDUCATIONAL AND TSYCHOLOOICAI, MEASUREMENT 


tlation between poor typists. I'^nr example, in one test part, 
the testee must alphabeth/e the material, tabulate in three 
columns, make a carbon copy, etc. A testee was given the 
maximum penalty if she neglecteil to alphabetize the material; 
others who made all of the errors listed above would he given 
the same penalty as the first girl. Although the error scores 
for these girls would he identical, the (juality of work on the 
test item concerned would he far from tlie same. 

Originally no penalty was made for corrected errors if 
erasures were neatly made. The assumption was made that 
girls making erasures automatically penalized themselves by 
Increasing the time lequircd for completion of the test. Statis¬ 
tical analyses showed, however, that validity coefficients were 
increased by penalizing such corrected errors. 

On the basis of errors made in the preliminary forms of 
the test, an item analysis was made of the seriousness of each 
error. This analysis showed that three classes of crrois were 
sufficient for a total error score. The tliree classes were named: 
(1) corrected errors, (2) minor errors, and (.3) gross errors. 
On the basis of the item analysis, the following three defini¬ 
tions were established to explain the three classes and to assist 
in the subsequent determination of tlie seriousness of errors 
found so infrequently that they could be given no statistical 
weight: 

Curreited Enors are those W'hich have been cor¬ 
rected by the typist (c.g., neat erasures). Each unit 
correction (whether it be a letter, word, or phrase) is 
counted as one corrected error. 

Minor Errors are those which are correctible 
(e.g., misspelled words, strike-overs, etc,) or which 
detract from the form, arrangement, or neatness of the 
finished work to the extent that the material is accept¬ 
able for use but is below the desired standard, 

Gross Errors are those which cannot be corrected 
unless the work is retyped (e.g., failure to make a 
carbon copy, failure to tabulate the material, etc,), 
those which are equivalent to two minor errors, or 
those which result in form, arrangement, or neatness 
below the minimum standard of acceptability. 

412 



A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


Error weights for gross, minor, and corrected errors were 
statistically determined by means of biserlal validity coelEcients 
for each type of error in a group of 63 employed typists, the 
criterion of success being grade of job successfully filled. Beta 
weights in raw score form were obtained through application 
of the Whcrry-Doolittlc test selection technique (5) using 
intercorrclations based on 250 applicants for typing positions.^ 
In order to simplify scoring procedures, beta weights were 
rounded to the nearest whole number. The total error score 
is the sum of the corrected errors plus two times the sum of 
minor errors, plus four times the .sum of gross errors. Data 
are summarized in Table 1. 

TABLE 1 


DEVELOPMENT OF ERROR WEIGHTS 


Type of 
Error 

Blseiial 
Validity r 

Beta Weight 
Z-Scoie Foim 

Raw Score 
Beta Weight 

Rounded 

Weight 




-.3212 

4 




-.1622 

2 




-.0915 

1 


Use of Combined Speed-Accuracy Score 

In some cases it is desirable to interpret test scores from 
the two viewpoints of speed and accuracy. Usually, however, 
a combined score is pieferable inasmuch as speed is worth 
little if not accompanied by accuracy, and accuracy is worth 
little if not accompanied by speed. Further, any typist can in¬ 
crease her speed at a sacrifice of accuracy or improve accuracy 
at a sacrifice of speed; therefore a combined score which is a 
function of both speed and accuracy will describe the typing 
performance better than either speed or accuracy alone. 

Usual methods for combining scores (such as converting 
to standard scores or* weighting by the reciprocal of the 
standard deviation) cannot be used with these data inasmuch 

^This proceduie assumes that the larger group is comparable in all relevant 
respects with the smaller (ciiteiion) group. The multiple correlation obtained 
fiom intercorrelationa based on the expanded group may be larger or sm^ler 
than that obtained from inteicorrelations llmil'ed to the criterion group. Ihe 
use of an expanded group, however, has been found to increase test validity 
when the test is used with another group in a follow-up study (4). 

413 






Kin'C'ATKlXAI. AND PSYCMIor-OCDrAL MKASUREMENT 


as till- distributions arc not of the same shape and error dis¬ 
tributions are fur from syinincirical (3). Time distributions 
closelv approximate noimality and error distributions are 
ftreatly skewed toward the high error end. 'Phis is as expected, 
because one etui of the distribution will be zero errors, whereas 
there is nti limit at the other enil of the distribution. 

I'irror acore.s were transformed into ‘‘converted errors" by 
mean!! of the best litliiijj; curved hue in accordance with Horst's 
method (.3). The resultant distribution, ba.sed on 636 cases, 
was normal, and was exprcitsed in terms of a mean and sigma 
equal to that of the time thstribution. Total error scores are 
changed into converted error scores by means of Table 2 The 
time score can he added directly to the converted error score 
to obtain a combined score giving equal weight to speed and 
accuracy, (hmaously, the time and converted error scores can 
be weighted in any other de,sired manner. 


d’AIJLE 2 

rMIt.l niR eUANUINC TOIAU I.RROR.S IN'IO 
CONVhini-I) l.KUOR SCORIv 


T.E. 

tMl.S. 

•IMC. 

tMv.S. 

T.E, 

C'.E.S, 

T.E, C.E,S, 


' 26 ’ 

15 

09 

’”' 30 ' 

86 

48-49 

98 

1 

.10 

10 

70 

31 

87 

50-52 

99 

2 

.14 

17 

72 

32 

88 

53-56 

100 

3 

.IH 

IS 

73 

33 

88 

57-60 

101 

4 

42 

19 

74 

34 

89 

61-65 

102 

.3 

46 

20 

75 

35 

90 

66-71 

103 

6 

49 

21 

77 

36 

91 

72-77 

104 

7 

52 

22 

78 

37 

91 

78-83 

105 

fi 

55 

h 

79 

38 

92 

84-89 

106 

9 

57 

24 

80 

39 

93 

90-95 

107 

10 

59 

25 

81 

10 

93 

96-101 

108 

11 

61 

20 

82 

41 

94 

102-108 

109 

12 

1,1 

03 

27 

83 

4243 

95 

109-114 

no 

05 

28 

84 

44-45 

% 

115-120 

111 

14 

07 

29 

85 

46-47 

97 

121-126 

112 


'I'.E.-- 

(dUiI rnois 1 

ihlaiiicd by 4 

X Kl'USS 

1 plus 2 X 

minor, plus cortected 



errors. 






( 


■ ctinvertcil einii score which can 

be combinetl directly with 

time 


score. 


414 



A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 
Nature of Test 

The Typing Ability Analysis- consists of five parts, each 
being complete in itself. The first part is not scored, and con¬ 
sists of typing identification material such as name, address, 
and date. Part Two consists of approximately 150 words of 
a draft of part of an article to be typed The work copy is in 
typed form, but contains thirty-five enors which are marked 
for correction Each error is accompanied by the correct form 
which is to be used. Part Three requires the tabulating of 
seven lines in three columns, together with appropriate column 
headings, title, etc. The fourth part consists of a letter ninety 
words in length. The letter is written in longhand and con¬ 
tains ten changes also made in longhand. Part Five requires 
alphabetizing and tabulating fifteen lines of authors’ names, 
book titles, and publication dates, together with typing column 
headings and title. 

All test parts contain instructions such as, ‘‘make a carbon 
copy on yellow paper,” ‘‘type the heading in capitals and un¬ 
derline it,” ‘‘place your initials and the present date in the 
lower left-hand coiner,” etc. Failure to follow directions is 
penalized. In a few cases penalties are made for items not 
specifically mentioned in the instructions; for example, failure 
of the typist to place the date or her initials on the letter. 
Such penalties are made only for fundamental errors and fail¬ 
ure to follow universal practice as taught in all typing classes 
and required of all industrial typists Table 3 contains a list 
of all probable errors in each test part, classed according to 
whether they are scored as corrected, minor, or gross errors. 

The Typing Ability Analysis is a work-limit test, each girl 
being permitted to complete the test. The shortest testing 
time in 636 cases was 26 minutes, and the longest time was 
120 minutes. The average (mean) time required by industrial 
applicants is 61 minutes, and by high school seniors is 78 
minutes. Although these average times are lengthy when com¬ 
pared with other typing tests, the increased time is warranted 

^Published by Science Research Associates. 

415 



KDUCATIONAI, ANH PSYCIIf)1 (KiirA L MKASURKMENT 
TAHL1-: 

iiAssini’vnov f)i. i Runiis 


I’ART Il—Uot/’CH l)RAi r Set 

^ r“ 

O .rt O 


Nut on pat’i' 15 A 

No cat bon coi>y x 

Carluin not on yellow iiiipin \ 

Not double iiiared x 

OinHsioii of wokI or plirnse a 
Did not make itiilicatrd cbanne x x 
Strikeover x 

Mixspellcd wont x 

Foot appearance x x 

Other error x x 

C’oiretted errorx x 


Far'i III— I'Atiui.A mis’ 


Nut on paire 13 x 

No rat 1)01) copy x 

Oarbon not on yellow paper x 

No title X 

Title not in caps x 

Title not underlined x 

Ileadini^ti not undcilined x 

Not tabulated x 

Omit “15+1" and/nr "1942'' x 

Omit “Type of I’aper" x 

Columns in wiont; oidei x 

3 or mote IlKures our of 
aliKiinietit x 

1 or 2 fiRiircH out of 
alignment x 

Omit word "total" x 

No line above "total" x 

No line below "total" x 

"Total" lines not cxictulcd x 

No tnitiala x 

No date X 

Strikeover x 

Misspelled word x 

Incotrect figure x 

Poor appearance x x 

Other error x x 

Correeted ertors x 


PAitr IV—la rn'R 


Not on ietteiliead x 

No (Ml bon copy x 

Caiboii not on velbnv paper x 

Moic Ilian +'j" aflei last line x 
1' a” to If ■" after last line x 

No (late X 

No inlli.ils X 

Omit ‘'eiiel." x 

I.fiA than 3 hues foi signatiiie x 
Did not mtiki iiidn tiled change x x 
btiikeover x 

MIstjielleil Wind x 

Poor appealaiiLc x x 

Otliei el ml x x 

C imeeted eiiois x 

Far I V—Aia'iiAiin'iziNn 

Not on page 9 x 

No eaiboii copy x 

Caibiiii not (111 jellow papci x 

No title X 

Title not in caps x 

Title not tmdeilmed x 

Column headings not undeiliiled x 
Omit 2 111 3 eoltmiii headings x 
Omit 1 colmnn heading x 

Not t.ilinlaied x 

I,css ihtin 3 spaces hetween 
coluimi.s X 

3 111 inoie items out of 
alignment x 

1 or 2 items out of 
alignment x 

Initials pieeede names x 

Line in wiong nidcr (max 
2 gross) X 

Line omitted x 

Im-oiiect ptihlieation date x 

liieoriect book title x 

Strikeover x 

Misspelled word x 

More lliaii 5 pnnetuntinn cirors x 
t-5 errnis in punctuation x 

Prior appearnnee x x 

Otliei error x x 

Corrected ctiors x 


416 


Gross 



A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


by the high validity of the test. Inasmuch as the test adminis¬ 
trator does no work after starting the test except to record 
the finishing time of the tcstee, the time required is of little 
practical importance to the administrator. The time element 
becomes important only if the typewriter being used cannot be 
spared for the required time, or if the testee objects to the 
time required It has been the experience of the author that 
the practical appearance of the test tends to eliminate objec¬ 
tions of testees to the time required. 

In some cases it may be desirable to administer the test 
with a time limit, particularly when all that is desired is a yes 
or no decision as to whether or not an applicant should be 
hired. The time limit should be determined in such cases by 
deciding on the lowest percentile in terms of combined score 
which will be acceptable. Assuming no errors, the converted 
error score for no errors (26) should be deducted from the 
combined score repiesenting the previously selected percentile. 
The resultant figure will give the time limit to be allowed. Or, 
if the administrator prefers, the time limit can be called when 
a desired percentage of the applicants have completed the test, 
all applicants who have failed to complete the test being elimi¬ 
nated from consideration. Use of the test with a time limit 
makes it impossible to secure a speed, accuracy, or combined 
score. Results theiefore can not be used with maximum effec¬ 
tiveness for guidance or training. 

Directions for Administering the Test 

The Typing Ability Analysis is practically self-administer¬ 
ing, and may be given either individually or in groups. In 
addition to a typewriter, each person taking the test should 
have the following materials: eraser, erasing shield, pencil, a 
sheet of carbon paper, 4 sheets of yellow paper for carbon 
copies, and a test booklet. 

Before starting the test, each testee is permitted sufficient 
time to become familiar with the typewriter being used Test 
booklets are issued with the instructions, ' Read the instruc- 


417 



i:nrt'.viK)NAt, and rsyc'noi.ocjK'Ai, mkaskrement 


tions nn the first patfe of the test. Do not turn the page or 
start the te.st until told to do so." [nstructions appearing on 
the first page of the test booklet are as follows; 

This test measures ability to do the kind of typing 
that is reiiuired in husiness and industry. 

Test results will he judged by usual office standards 
and W'ill he rated on the basis of accuracy, speed, and 
form. Ivrrors will be penalized in proportion to their 
seriousness, least for errors which have been corrected, 
more for error-s which could have been corrected, and 
most for eriors which would require retyping of the 
part in w’hich they occur. 

Pirrors may be corrected by erasure. Do not re¬ 
type any part unless absolutely necessary, and in such 
case use the hack .side of the same sheet of paper. 

Work as rapidly as possible, but do the kind of 
work desireil by an employer. 

After the instructions have been read, the examiner gives 
the signal to start the test. I'he exact time that each person 
starts and finishes the test is recorded When the tests are 
completed, the examiner makes sure that the items are in 
correct order, ami then staples all parts together. The time 
of starting and finishing is recorded on the first page of the 
test. 

Scorhiy 

The time (speed) score is the number of minutes required 
to complete the test. 

The error score is based on three types of errors: (1) 
corrected errors, (2) minor errors, and (3) gross errors, an 
explanation of which was given earlier. 

L'lach test part is proofread carefully and compared with 
the instructions, Pirrors arc marked on the test by encircling 
them with a red pencil, and recorded on the rating sheet. The 
rating sheet contains a list of all errors commonly made, but 
is not a complete list of all possible types of errors. When 
unlisted errors are made, the definitions for errors as given 
previously should determine whether they are gross or minor. 

418 



A TEST POR SELECTING AND TRAINING INDUSIRIAL IITTSTS 

A single error is not penalized twice. For example, figures 
out of alignment are penalized as such. An additional penalty 
is not given for poor appearance. If in copying the rough 
draft, the typist spells screen as “serene” a penalty is given for 
either misspelled word or failure to make indicated change. 
The error is not penalized in both ways. 

When penalizing results for “poor appearance,” the 
quality of paper is taken into consideration, Inasmuch as neat 
erasures are almost impossible on some types of paper, but 
are easily made on other types. 

Norms 

As has previously been mentioned, a combined score will 
generally be more valuable than separate speed and accuracy 
scores. Correlational and regression equation techniques 
showed that for the typing jobs considered in this study, speed 
and accuracy were approximately equal in importance Com¬ 
bined scores of maximum efficiency should be obtained by mul¬ 
tiple correlations and regression equations based on test results 
of typists hired for each company using the test. Norms given 
here for three different weightings of speed and accuracy, 
however, will be adequate approximations for most jobs. The 
most suitable of the three will generally be the SA score which 
gives equal weight to speed and accuracy. It is obtained by 
adding the time score to the converted error score The 2SA 
score (two times the speed score plus the converted error 
score) weights speed and accuracy in the ratio 2 ■! and is suit¬ 
able for jobs that require fast speed and where accuracy is 
comparatively unimportant (as might be the case for a rough 
draft copy typist). The S2A score (speed score plus two 
times the converted error score) weights speed and accuracy 
in the ratio 1:2 and is suitable for jobs placing a premium on 
high accuracy and where speed is comparatively unimportant 
(as in the case of some typists of legal documents). 

Industrial and educational norms are given in Table 4. 
Industrial norms are based on 381 applicants for typing posi- 


419 



htn't'ATinXAI. AM) PSYniOKKaCAI. measurement 

tions. lulucatioiKi! iinrnis are based on 255 high school seniors 
who were given the test from one to twf) months previous to 
being graduated and wlitt \eere In the advanceil (second year) 
typing class. 

'I'AHI.r. 4 

NOItMS 






!ii>!.i 1. 

'A\ 



Kildrntinnal 






K 

i • 1 Ar 

>!•)!< rilti 

. 


N V‘, 

fiTi H.S Pt'iilors 



Pi.Ill 






('timliincd 



'MIo 

Here 

Tiiiii' 

III 

• s,v 

■JSA 

S-JA 

'1 OlK' 

I’llTOI' 

-- 

H' SA 

iiSA 

SJA 

i«n" 

"3.00 

Ifi 


54 

78 

* 7(i ' 

' "h" 


7C 

lie 

104 

ss 

2.33 

i>\ 

II 

7! 

1(13 

103 

41 

1 

91 

139 

128 

98 

2.05 

30 

1 

77 

111 

114 

4C 

2 

97 

148 

137 

95 

1.C4 

37 

3 

88 

128 

1.30 

51 

4 

106 

1C2 

152 

91) 

1 38 

42 

5 

9C 

142 

144 

57 

5 

114 

174 

164 

80 

.8+ 

49 

7 

1(17 

158 

Ifil 

C3 

8 

123 

188 

179 

70 

.52 

54 

9 

115 

170 

174 

C7 

10 

130 

199 

191 

CO 

.25 

58 

12 

1 JO 

180 

185 

71 

12 

13C 

208 

200 

50 

.00 

Cl 

14 

128 

189 

194 

75 

14 

142 

217 

209 

40 

-25 

C5 

IC 

U4 

199 

2(11- 

78 

1C 

147 

225 

217 

30 

-.52 

fi9 

20 

141 

209 

215 

82 

20 

153 

234 

227 

20 

-.81 

74 

25 

149 

221 

228 

87 

24 

ICO 

2+5 

238 

10 

-1.28 

81 

32 

159 

237 

245 

93 

31 

170 

260 

253 

S 

-I C+ 

SC 

41 

IftS 

25(1 

259 

98 

38 

177 

272 

266 

2 

-2.05 

92 

58 

178 

2CC 

275 

101 

49 

18C 

285 

280 

1 

-2.33 

97 

81 

185 

27(1 

2SC 

108 

CC 

102 

295 

290 

M 


Cl.41 

17.20 

127.9C 189.37 

194 50 

78,78 

17.14 

141.7C216.C4 208.63 

SdT, 

rn . 

15 14 

11 CO 

24 Cl 

37 2/; 

39.37 

1433 

_lt.n8_ 

_ ^ ZL. 

33.53 

34.81 


n(ul s(.iiiilnr<l hmmcs nmiputt'il on hnsis tif ronveited error 
scores. I'‘tM eimveiutnec, cuuis repoiietl la iliis lulilc arc UiuU ciuii scales. 


Norms arc bascil on standard scores for selected percentile 
points. Analysis of data by means (d’ the Otis Normal Per- 
cvutilc CluirP showed marked linearity (normality) of all 
distributions. Standard scores can consequently he used to 
determine percentile points. 'Hie convcitcd error score was 
used for computing error norms, though for convenience the 
errors reported in Table 4 are expressed in terms of total 
errors rather than conveitcd error scores. 

CorrNiilions Bdivccn S/wrcl and Accuracy 
Correlations between .speed (minutes) and accuracy (con¬ 
verted error .scores) arc ail low. Summarized data are given 
in Table 5. 


‘'Published by World Book Company 

420 



A TEST TOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 


TABLE 5 

CORRELATIONS BETWEEN SPEED AND ACCURACY 


Group 

N 

r 

Standard eiror 

All cases 

636 

+.137 

+.04 

Industrial Applicants 

193 

+.213 

+.07 

Civil Service Applicants 

188 

+.200 

+ .07 

High School Seniors 

255 

+.077 

±.06 


Validity 

Validity is based on 67 employed typists in an industrial 
population. Typists were dichotomized on the basis of grade 
of job successfully filled, and validity determined by means of 
biserial coefficients. The p group consisted of 28 girls em¬ 
ployed in Kimberly-Clark Corporation’s home office or in 
positions of similar caliber in various mills of the coiporation 
The q group was composed of 39 girls employed in mill offices 
and mailing departments of the same corporation, All girls 
were engaged in work which was primarily typing, 

Guilford (2) has pointed out that: “a biserial r should 
not be computed unless the graduated seiies of measurements 
is reasonably well normally distributed and unless N is rela¬ 
tively large—preferably when N is greater than 50 Another 
important condition is that the cases be not too unevenly 
divided between the two distributions ” These data fulfilled 
the above requirements reasonably well. The N of 67 was 
divided into p and q groups containing 42% and 58% of the 
cases Pearson’s chi-squared test of goodness of fit gave a P 
of .748 for a combined score based on equal weighting of 
speed and accuracy. Culler (1) classes this as an “excellent” 
fit. Validity coefficients are given m Table 6 * 

■^It may be pointed out that an assumption underlying the derivation of 
biseiial r is that the dichotomized tiait is in reality continuous and normally 
distributed. If this condition does not hold, the size of btsenal r may be 
appreciably affected; the value of r indicating perfect relationship may be 
considerably gieater than unity and obtained r’s gieater than would other¬ 
wise be obtained. It is entirely possible that this assumption was not fulfilled 
with these data, although the magnitude of the effect cannot be measured due 
to lack of methods which can be used to demonstrate normality of criteria in 
cases such as this An attempt was made to approach normality so far as pos¬ 
sible by including all giades of typists ranging from those in beginning typing 
jobs to those in the highest grade typing jobs of Kimbeily-Clark Corporation. 

421 





Knpc.vnoN’Ar. an’d psYcnoi.onic’AL measurement 


'FA]} Lie 6 


VAI.IDITTi’ C’OrKriCII'N'IS 

(N (i7 Kiiipl()\c(I Tndustrinl Tviiist;.) 


bcoi (■ 

Validity I 

Standaid Eiror 

Cnniliined (S.'\) 

.. ■ ' ,057 

±iir^ 

'Fiine 


.+.08 

Ctmvcrtfd nnirs 

.711 

±.09 

Raw rruir'- 

.65fJ 

±.10 

fIn)^s ermi'i 

.45(1 

±.13 

Alintir errors 

.555 

±.12 

Corieeted eirnr'i 

.L14 

±.U 


Additiniiiil evidence of validity was secured by comparing 
the diftcrcnces in combined (SA) scores between various 
frrciups. Differences are suinmuri'/ed in Table 7. Critical 

TARLE 7 

coMi'Auisox nr canors to nniTiiMiNi: validitv 



fhuup 


N 

M 

S.D, 

A. 

Ind. typists—liiRh jolt 

classilication 

28 

91 89 

14.04 

B. 

Ind. typists -low job 

classification 

39 

135.05 

20 85 

C, 

Iiul. typists- -released, 

inadecpiate 

10 

178,.50 

21.71 

I). 

Civil Service applicants—Jr. typists 

6? 

122,00 

20.71 

E. 

Civil Service aitplicants—Asst, typists 

125 

130.22 

19.57 


Clrniips Compared 

Critical Ratio 


Significance 



A and H 

9.97 


.999 



B and C 

5.18 


.999 



D and E 

2.60 


.995 



ratios were computed by dividing the difference between the 
means by the standard error of the difference. The standard 
error of the mean was computed by formulae suitable for small 
samples (2), as follows: 

(T,„ =-'-7>f:rr ^ = 

^vhenl0<N<20. 

All critical ratios are significant at the 1% level, thereby 
giving additional evidence of test validity by differentiating 
between groups which logically should show differences in 
ability. 


490 



A TEST FOE SELECTING AND TRAINING INDUSTRIAL TYPISTS 


Reliability 

No adequate situation has yet been found in which to 
secure accurate reliability coefficients. Split-half methods are 
inapplicable inasmuch as test parts were selected on the basis 
of low intercorrelations as well as high validity coefficients. 
Two equivalent forms have been constructed; however, their 
use will usually be inadequate for reliability coefficients be¬ 
cause of the effect of practice or disuse between testing periods. 
The same is true of repeated administration of one test form. 

Table 8 presents reliability data obtained by administering 
two test forms to 63 high school seniors in their second-year 
typing class The first test administration was in April and 
the second in May. Reliability is probably higher than indi¬ 
cated, inasmuch as practice between the two test administra¬ 
tions varied considerably; e g., some girls missed many typing 
classes because of commencement activities whereas others 
received considerable extra practice because of typing material 
for the school annual. Reliability coefficients quoted here are 
thus lowered by irrelevant influences of speed of learning and 
opportunity to learn. In spite of the unfavorable conditions 
under which reliability was secured, results nevertheless indi¬ 
cate reasonable reliability. 


TABLE 8 

reliability and probable errors 


(N = 63 High School Seniors) 


Time 

Converted Errors 
SA Score 
2SA Score 
S2A Score 


Reliability Probable Error of 

Coefficient Index Obtained Score True Score 


.768 876 

720 .848 

.832 .912 

.846 .920 

.800 .894 


4.66 4 08 

5.47 4.64 

6.02 5.49 

9.23 8.49 

10.50 9,39 


Interpretation and Use of Scores 

The Typing Ability Analysis is now being used for several 
purposes and in several types of situations. The purpose for 

423 



KIU’CATIONAI. ANIl I'.S\(’IIf)I.f)(5ICAI, MEASUREMENT 


its use in any particular situation determines the way in which 
results should be interpreted and utili'/ed 

Vaiious companies are using the analysis for selecting 
typists lor specilie openings in stenographic and secretarial 
work. The test obviously slioiihl not be used as the sole means 
of selecting typists, Many faclois must he taken into con¬ 
sideration, aiul this analysis measures only one group of such 
factors. 'I’he test should be used as a supplement to other 
selection methods and procedures, and test results should be 
interpreted in light of all other pertinent factors. Most com¬ 
panies using the test for employee selection do not discuss test 
results with applicaiit.s, although some believe that the time 
required for such discussion is warranted on the basis of 
improved public relations. 

When the analysi.s is used for vocational selection, the 
industrial norms will usually apply hecause the employment 
manager will be interested most in knowing how a given 
applicant compares with other applicants. In the case of a 
recent high school graduate without any job experience, the 
employment manager may also wi.sh to know how the applicant 
compares with high school senior.s. 'riuis he is able to estimate 
not only how qualllicd she is at the present lime, but also how 
satisfactory she is apt to be after securing typing experience, 

A second major use of the analysis is in training currently 
employed typists. Results arc discussed with the girl concerned 
and she is told whether her speed and accuracy are acceptable, 
the type (or types) of error she is prone to make, and other 
shortcomings which should be corrected in order that her work 
may be improved. 

A third use of the analysis has been in the upgrading of 
industrial employees. Results have been used in the same way 
as in training in order to help typists prepare themselves for 
higher-caliber typing jobs, or to enable girls working in the 
plant on production jobs to lit themselves for typing jobs in 
the office. 



A TEST FOR SELECTING AND TRAINING INDUSTRIAL TYPISTS 

High schools are using the analysis for vocational guid¬ 
ance. Such usage generally includes suggestions for improving 
typing ability as well as recommendations regarding types of 
jobs which can be filled successfully. High school teachers will 
usually be interested in using the educational norms, although 
it may also be of value to determine how a particular student 
who will soon be a job applicant compares with other job 
applicants. Although high school teachers sometimes believe 
that such comparison is unfair due to lack of experience on 
the part of high school students, it must be remembered that 
most industrial personnel men are more interested in hiring 
an applicant with good typing ability than in hiring a promis¬ 
ing high school student who compares favorably with other 
students but who cannot compete successfully with other job 
applicants. This is particularly true during depression periods 
when jobs are scarce and applicants are numerous. 

The Typing Ability Analysts can also be used to compare 
the ability of various typing classes, efficiency of different 
teachers, rate of progress, etc., in all situations where typing 
ability is defined as those factors which differentiate between 
successful and unsuccessful typists in the industrial situation. 

REFERENCES 

1. Culler, E. “Studies in Psychometric Theory,” Journal of Experi¬ 
mental Psychology, IX (1926), 169-194. 

2 Guilford, J. P. Psychometric Methods New York' McGraw-Hill, 

1936, 51-52, 351. 

3. Horst, Paul. "Obtaining Comparable Scores from Distributions of 
Dissimilar Shape,” Journal of the American Statistical Association, 

XXVI (1931), 455-460. 

4. Jurgensen, C. E. “Extension of the Minnesota Rate of Manipula¬ 
tion Test,” Journal of Applied Psychology, (1942) In press. 

5. Stead, W. H., Shartle, C. L., et al. Occupational Counseling 
Techniques. New York: American Book Co , 1940, 245-252. 


425 




' MEASUREMENT ABSTRACTS* 

Bryan, Alice 1. and Wilke, Walter H. “Audience Tendencies 
in Rating Public Speakers.” Journal of Applied Psychol¬ 
ogy, XXVI (1942), 371-381. 

Using the Bryan-Wilke Scale for rating public speeches, 
the authors studied a variety of audiences with a number of 
factors that are associated with audience ratings, such as fac¬ 
tors related to analytical ability of audience, time of rating, 
effect of age of raters, influence of sex of raters, and intel¬ 
ligence and personality of speaker. Louise T Grossnickle. 


Carter, Harold D. “How Reliable are the Common Measures 
of DifEculty and Validity of Objective Test Items?” 
Journal of Psychology, XIII (1942), 31-38 
Two hundred college students, mostly juniors, were given 
an objective test, consisting of 80 items, of which 30 were 
true-false, 30 multiple choice, and 20 of the completion 
variety. The purpose was to ascertain the reliability of meas¬ 
ures of item difficulty and of item validity by means of dif¬ 
ferent subgroups, The author found that a measure of dif¬ 
ficulty of test items, based upon a representative sampling, 
yielded a higher reliability coefficient than that obtained from 
the ordinary method of using good and poor students. K. S. 
Yum. 


DuBois, Philip PI. “A Note on the Computation of Biserial r 
in Item Validation.” Psychometrika, VII (1942), 143- 
146. 

A method of computing biserial coefficients of correlation 
through the use of punch card tabulating equipment is pre¬ 
sented. Each item is assigned a separate column and successes 


427 


‘Edited by Forrest A Kingsbury. 



F.nlH’ATIONAI, AND I'SYCnOLOCK’Ar, MEASUREMENT 


arc punched 1. By arrungiriK the cards on the criterion vari¬ 
able and obtaining progressive sums on several columns 
simultaneously, it is possible to obtain data for several corre¬ 
lations in one run of the cards through the machine, (Courtesy 
Psychowi’tiika,) 


lingelhart, Max D. “Unit|uc Types of Achievement Test 

Plxercises." P.\ychnm('lrika, Vl\ (1942), 103-115. 

In this article arc presented a number of unusual achieve¬ 
ment test exercises of both the essay and the objective types, 
These exercises may suggest to others engaged in the construc¬ 
tion of achivenient tests ceitain forms which they may find 
useful cither as models or as points of departure in the inven¬ 
tion of new forms, d'he article also calls attention to certain 
problems which must he solved if achievement testing is to 
have a sound, scientific hasis. (Courtesy Piychomclnka.) 


Estes, Stanley (1. "A Study of Five 'Pests of ‘Spatial’ Ability." 
Journal of Piyclinlof/y, XIII (1942), 265-271. 

The object of the stinlv was to determine the extent to 
which each of fiv'C tests, all of them reqiuiing response to 
spatial relationships, were related to each other and to achieve¬ 
ment in descriptive geometry, a subject where the ability under 
consideration was of basic importance, 'Phe correlations of 
four of these tests wdth descriptive geometry were all reliably 
greater than zero and did not differ significantly from each 
other. Therefore, the author concludeil that the tests, with 
the exclusion of the Crawford Structural Visualization Test, 
were equally valiil with the criterion he used. K. S. Yum. 


Ferguson, Leonard W. and Lawrence, Warren R. “An Ap¬ 
praisal of the Validity of the Factor Loadings Employed 
in the Construction of the Primary Social Attitude Scales." 
Psychometrika,YU (1942), 135-138. 



MEASUREMENT ABSTRACTS 


In this article the authors examine the effect of including 
alternate test forms in a factor matrix upon the validity of 
the resultant factor loadings, finding that in this particular 
instance the effect is negligible. Comparisons of the factor 
loadings derived from matrices in which only one of the alter¬ 
nate test forms is included with those in which both forms are 
included reveal practically no difference in the magnitude of 
either the original or rotated factor loadings, or in that of the 
computed communalities. (Courtesy Psychometrika.) 


Ghiselli, Edwin E. “Estimating the Minimal Reliability of a 
Total Test from the Intercorrelations Among, and the 
Standard Deviations of, the Component Parts.” Journal 
of Applied Psycholopy, XXVI (1942), 332-337. 

Due to the nature of a test, two equivalent parts are not 
available for estimating its reliability However, if, in all of 
the parts, sigmas are equal and the intercorrelations are equal, 
then the Spearman-Brown correction formula for any length 
can easily be derived from a general formula for the reliability 
coefficient of the total test. Since = r \2 will be a minimum 
estimate of it is possible to obtain a minimal reliability 
coefficient of the total test. K. S. Yum. 


r , 

Kelley, Truman; L. “The Reliability Coefficient.” Psycho- 

metrika, VII (1942), 75-83. 

The reliability coefficient is unlike other measures of corre¬ 
lation in that it is a quantitative statement of an act of judg¬ 
ment—usually the test-maker's—that the things correlated are 
similar measures Attempts to divorce it from this act of 
judgment are misdirected, just as would be an attempt to 
eliminate judgment of sameness of function of items when a 
test is originally drawn up. A “coefficient of cohesion,” en¬ 
tirely devoid of judgment, measuring the singleness of test 


429 



hOrCA'llOXAl, AM) I’SYC'noUKilCAL MEASL’REMENT 


function is proposed as an essential datum -with reference to a 
test, l)Ut not as a sulistitiite for the similar-form reliability co¬ 
efficient. (Courtesy Psyrlinnu'lrika.) 


Kuhlmann, F. and Odoroft, M. IC "Verification of the Heinis 
Mental (irowth Curve on Results with the Stanford-Blnet 
’I'ests." Journal of PAycholoffy, XIII (1942), 3SS-364. 

The usefulness of the LQ. depends upon its constancy for 
the hrip;ht child and the dull child as well as for the typical 
child. I lowever, this assumption is not warranted for all levels 
of intelligence. The late Dr. Kuhlmann preferred the index 
based on tlie I Icinis mental growth curve because he believed 
in its superiority over the I.Q. for predictive purposes. This 
particular study, based on a huge number of cases, shows that 
the average Stanfonl-Binct I.Q. of a group of special class 
pupils dro|is approximately 1.5 points per year or a total of 
15 point.s between the ages of 6 and Id, while the average 
Heinis "personal constant," which Kuhlmann has renamed the 
"[)cr cent of average" seme, for the same cases shows no 
tendency to increase or decrease. Louise T. Grossuickle. 


Moffie, Dannie J. "A Non-verbal Approach to the Thurstone 
Primary IMental Abilities." Journal of General Psychol- 
o(jy, XXVII ( 1942), 35-61. 

An attempt was made to measure five of the Primary 
Mental Abilities—'perceptual speed, space, inductive reasoning, 
deductive reasoning, and memory—-by means of performance 
test.s, The author was successful in finding non-verbal meas¬ 
ures of the space, reasoning, and perceptual speed factors. 
Tests for inductive and deductive reasoning were found to be 
measures of one reasoning factor. Robert L, Cramer. 



MEASUREMENT ABSTRACTS 


Munroe, Ruth L. “An Experiment in Large Scale Testing by 
a Modification of the Rorschach Method." Journal of 
Psychology, XIII (1942), 229-263. 

This technique of the Inspection Diagnosis consists essen¬ 
tially of a systematic review of each protocol with special 
attention to twenty-four items known to be of significance in 
Rorschach diagnosis. The results show some very striking 
correspondence between the Rorschach ratings and three sep¬ 
arate lines of validation material, and suggest a strong prob¬ 
ability that the Rorschach Inspection Diagnosis is a valid and 
useful technique for large scale testing. Louise T. Grossmckle. 


Powell, Norman J. and Levine, Harold. “Reliability of the 
Civil Service Oral Examination.” American Journal of 
Psychology, LV (1942), 385-393. 

Ninety-nine applicants who had passed a written examina¬ 
tion for the position of Junior Psychologist were interviewed 
and rated in the conventional manner by two panels acting with 
varying degrees of independence. Considerable differences in 
ratings were found in all cases. Robert L. Cramer, 


Stagner, Ross and Katzoff, E. T. “Fascist Attitudes: Factor 
Analysis of Item Correlations.” Journal of Social Psy¬ 
chology, XVI (1942), 3-9. 

Eighteen statements reflecting Fascist thought were pre¬ 
sented to one hundred college students to be checked according 
to agreement or disagreement. A centroid factor analysis of 
the correlations between items showed three factors to be 
present; concern over protection of property rights, lack of 
sympathy for the unfortunate, and an aggressive nationalism. 
Robert L. Cramer. 


431 



r.DrrATIftXAI. A\I> PsM WOlnr.Ii M, Ml \S1 KIMKNT 


Taylor, William S. “Partiallitijf out Sums nt Stiuarcs and 
Products in Calculatinp; Corn latious with Xon-homo- 
gcneous Data." Jiiut}7,i! uj Pf,i XXXII 

(1942), .IIK-.12.P 

If the population teslcd is homogeneous, a coefliclent of 
correlation cakulaied ilircetlv from the desiation.s of indi¬ 
vidual scores about the giMml nu-.m uill gi\t* a reliable indica¬ 
tion of the correlation between the scores of the individuals 
tested. Where the piipulation is not luimogeneous, group dif¬ 
ferences may he significant. ’Pile correlation desired is that 
free from the inliueiue of gro»j> dijterences. In this latter 
case, it is necessary to partial out the sums of sipiares and sums 
of products of deviations Jroni the mean, using only those 
attrihutahle to the deviatitms "within gioups." K. ,S. Yum. 


Thom.son, (iodfrey II. "l-ollowing up Individual Items in a 
(iroup Intelligence 'Pest." Hnti\Ii .hiurtiul nf Psychology, 
XXX11 (1942), .11 (kl 17. 

The article descrihes the teehnit|ue used for item selection 
in the construction of Moray House 'Pest 24, a group intel¬ 
ligence test, and presents a later follow-up study of the test 
items in their discriminating fimetion. The research reveals 
that the predictive power of the \annus test items dilfers with 
different levels of educational achievement. 'Phe items that 
predict well in the secondary school are not necessarily the 
best indication of their power to discriminate the potential 
secondary school pupils from those not suitable, K. S. Yum. 


Thorndike, Robert L. "Regression h'ullacies in the Matched 
Groups I-lxpcriment." Psnfiomctnku, VII (1942), 8S- 
102 . 

This paper is concerned particularly with certain regression 
effects which appear whenever matched groups are drawn from 
populations which differ with regard to the characteristics 

432 



MEASUREMENT ABSTRACTS 


being studied. It is shown that regression will produce sys¬ 
tematic differences between these groups on measures other 
than those upon which they were specifically matched. The 
size and direction of these differences depend upon the dif¬ 
ferences between the parent populations both in the matching 
and in the experimental variables and upon the correlation 
between the matching and experimental variables. Formulas 
are presented for estimating the expected regression effect. 
Several alternative procedures are suggested for avoiding the 
erroneous conclusions which the regression effect is likely to 
suggest (Courtesy Psychomelrika.) 


I'inkelman, Sherman. “Civil Service Test Item Preparation; 
A Case Study." Pithhc Personnel Quarterly, III (1942), 
3-74. , 

The author traces the evaluation of test items to be used 
in a civil service examination, discussing source material, 
validity, public relations impact, and revision of the items. 
Robert L. Cramer. 


Wolfle, Dael L. "Factor Analysis in the Study of Person¬ 
ality." Journal of Abnormal and Social Psychology, 
XXXVII (1942), 393-397. 

A review of the previous studies in this field singled out 
seven factors, each of which had appeared in three or more 
studies. These were will, cleverness, shyness, self-confidence, 
fluency, depression, and hypersensitivity. The author noted 
two important characteristics of (these personality factors. 
They will sometimes duplicate each other or sometimes cut 
across. Chief emphasis was placed upon the statement that 
factor analysis provides a powerful analytic tool for isolating 
the important variables of human personality and that the re¬ 
sults thus obtained depend on the evaluation by clinicians and 
experimentalists. K. S. Yum. 


433 



KDl'CATIiiXAI, AXn I’.SVC'IIOI.OCICAI, MEASUREMENT 


Yum, K. S. “Student Picferenccs in Divisional Studies and 
Their Preferential Activities." Journal of Psychology 
XIII (1942), 193-2110. 

The Kuder Preference Record was given to 193 college 
students for a study of their preferential interests in the seven 
major types, namely, scieniilic, computational, musical, artistic, 
literary, social service, and persuasive activities. The author 
found that the comparison of the mean profiles of the students 
in the physical, hiological, and social sciences as well as the 
comparison oi the mean proliles of men and women were 
significantly and consistently different on some of the major 
types of preferences. 'Phe correlation coefliclents between the 
preference scores and academic achievement were negligible 
except in the case of the literary activities for the entire group 
and also for the group of men, and the computational activi¬ 
ties for the group of women. Louise T. Grossnickle. 


434 




